A perfect power, and then?

Thu Oct 25 23:44:50 CEST 2012

Torbjorn Granlund <tg at gmplib.org> writes:

> OK.  One might also want to consider which is the most useful function.

Computing x^{1/2} and x^{1/n} looks nice, at least in the manual ;-). And
we get there with a single mullo if we compute x^{-1/2} and x^{1/n-1}.
Which variants really are most useful for applications, I don't know.

>      Now, the inversion in perfpow appear to be amortized over several
>      is_kth_power calls. I'm not sure what that means for performance.
>      Since these calls will work with roots of different sizes, a single
>      large binvert may still be slower than a number of mullo's of various
>      smaller sizes.
>      
> I didn't get that.

The point is, that I suggest that we get rid of the binvert call early
in perfpow, and replace it by a mullo for each root we compute (to get
from n^{-1/2} to n^{1/2}, where the current code instead computes
n^{-1} and (n^{-1})^{-1/2}). If there were a single binvert call and we
can replace it by a single mullo, that seems like an obvious win.

But now there's a single binvert, followed by a loop calling to
is_kth_power. Each iteration computes a root and needs an additional
call to mullo. So then it's not as obvious an improvement, but I suspect
replacing binvert by mullo will still be a win.

> Sounds right.  Such convolution type sum-of-products might want a
> separate function, returning 3 limbs.

But I'm not talking about a convolution, but about the complete
powering. If x is a candidate nth root, and x^n is of size s bits, I
want to compute a decent approximation of floor (x^n / 2^{s -
GMP_NUMB_BITS}) (a single limb value), and compare that to the input
number which x is supposedly a root of. Simplest way would be to use the
binary powering algorithm, and represent all intermediates as a
normalized limb sized mantissa and a shift count. To use it, one needs a
strict bound on the error, which I guess would depend on n, but I don't
think the error analysis is very difficult. One might even consider
using the floating point hardware.

I think that's going to be a lot faster than computing any middle limbs,
and that it's going to rule out almost all numbers which aren't
perfect n-powers.

(I'm talking primarily of the nth root case, but I think the
specialization to n = 2 would be pretty good too).

> Perhaps you could add some of your remarks as a comment to the file?

Makes sense. Not tonight, though.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.