mpn_sqrtrem{1,2}

Tue Feb 28 15:52:36 UTC 2017

Ciao,

Il Dom, 29 Gennaio 2017 5:29 am, Bradley Lucier ha scritto:
> One checks the correctness of each of these routines by showing,
> e.g., that for each uint32_t value i, that sqrt64_float_double
> gives the correct answer for (i*i) and (i*i + 2*i), so by
> monotonicity it gives the correct answers for all values in between.
> There are too many uint64_t values to check programmatically, and I

As a contribution to this discussion, I pushed a piece of code for this
kind of tests: https://gmplib.org/repo/gmp/rev/11887dfdf7a6 .

Once you have changed mpn_sqrtrem{1,2}, and compiled the resulting GMP
library, you can:
cd tests/devel
make sqrtrem_1_2
./sqrtrem_1_2 #This checks exhaustively ALL values 1,2,3,4,5,...
./sqrtrem_1_2 any_arg #This checks values i*i, i*i+2i, for i=1,2...

The first form can exhaustively check mpn_sqrtrem1 with ABI=32, then it
continues with 33 bits, 34... but I did not even try to estimate how much
time it needs to reach 60 bits or more.

The second form is faster, of course, and can check the corner cases up to
64 bits within a reasonable time. With ABI=64 it keeps on growing the
operands, but again I did not estimate the needed total time to check all
possible inputs to mpn_sqrtrem2, I assume it is infeasible.

With the current implementation of mpn_sqrtrem, we might check only values
larger than GMP_NUMB_BITS-2 bits. But I assume that a new mpn_sqrtrem1
function may have a different interface, and be used for a larger range of
values. That's why I checked all sizes. It's only a factor 2 in the total
time, it does not make feasible the infeasible :-/ Checking the functions
this way allows to check that they are correct when integrated in the full
library.

Il Gio, 16 Febbraio 2017 10:45 pm, Adrien Prost-Boucle ha scritto:
> 10000000 times, 100 bits ...... vanilla 1.373 s / FP 1.077 s
> There is some noticeable speedup only for very short bit widths.
> And the speedup is "only" 20-30%.
> Which is a bit disappointing given the 2x-3x speedup put
> on mpn_sqrtrem{1,2}.

The current mpn_sqrtrem{1,2} functions require a "normalised" input, i.e.:
if the input is too small, mpn_sqrtrem shifts it back and forth,
recomputes the reminder... lot of time that can be saved if you
implementation of those two functions (e.g. using floating point) directly
support any bit-size.

This approach needs changing both the mpn_sqrtrem1 (or 2) function AND the
login in calling it from the main function mpn_sqrtrem.

If the new mpn_sqrtrem1 supports any limb value, you can insert a simple:
#if defined(SQRTREM1_SUPPORTS_UNNORMALISED)
  if (nn == 1) {
    sp[0] = mpn_sqrtrem1 (&rl, high);
    if (rp != NULL)
      rp[0] = rl;
    return rl != 0;
  }
#endif
_before_ the count_leading_zeros(c,high) logic, to skip the unneeded parts
of the wrapper. (And of course #define when your code is used).

A similar streamlined path for sqrtrem2 may allow better results for the
100 bits case you measured.

Regards,
m

-- 
http://bodrato.it/papers/