A perfect power, and then?

Thu Oct 25 22:16:16 CEST 2012

Torbjorn Granlund <tg at gmplib.org> writes:

> With all respect for your functions, I don't think we should replace the
> tested ones in perfpow.  I think those should in he first place be
> improved (and put in their own file).

I see. Do you want the functions in the separate files to compute
x^{-1/2} and x^{-1/n}, or the non-inverted values? Moving the functions
to separate files and writing the test code and speed support will
obviously be very helpful when the functions are improved or rewritten.

I don't fully understand current perfpow.c. But some comments
nevertheless, for possible improvements:

1. We seem to get from inverted root to plain root by binverting the
   input. For the binv_sqrt case, it's seems clearly cheaper to do it
   with a mullo. For binv_root, it's not as obvious since we need a
   different iteration to converge to x^{1/n - 1} rather than to
   x^{-1/n}, but I think both iterations should be about the same cost,
   so that mullo + other iteration is cheaper than binvert + current
   iteration.

   Now, the inversion in perfpow appear to be amortized over several
   is_kth_power calls. I'm not sure what that means for performance.
   Since these calls will work with roots of different sizes, a single
   large binvert may still be slower than a number of mullo's of various
   smaller sizes.

2. The iterations take no advantage of cancellation. E.g., in the
   binv_sqrt, we should do the iteration as

     x' <- x - x * (a x^2 - 1)/2

   Here, low part of a x^2 - 1 is always zero, and when we next multiply it
   by x, we don't need to multiply by those zero limbs. I guess we could
   also make some use of mpn_rsh1_sub.

3. In pow_equals, we compute and compare the least significant limbs of
   the product first. But the low xn limbs will *always* match since
   they're computed as a mod B^{xn} root, right? I think a better test
   would be to compute the *most* significant single limb of the power.
   I think that's doable with only double-limb arithmetic (umul_ppmm) a
   relative error on the order of k/B or so. Such a check should give a
   pretty small probability of false positives, so if the high limb is
   close enough, it may be simplest and good enough to always compute
   all the limbs for a final check.

4. Use wraparound where possible in the newton iterations.

5. In binv_root, I think the number of bits of precision is *exactly*
   doubled in each iteration. At least I'm fairly confident that it does
   for the interation in my code, and I think that's how Hensel lifting
   usually works. The reason why sqrt does not exactly double the number
   of bits is that we divide by the exponent (2), which in this case is
   a factor of the modulo, which makes it a somewhat strange Hensel
   lift.

Regards,,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.