A perfect power, and then?

Torbjorn Granlund tg at gmplib.org
Sat Oct 27 17:00:20 CEST 2012

nisse at lysator.liu.se (Niels Möller) writes:

a^{1/k}, it should work fine to use a table indexed by low bits of a and
*low bits only* of k.

I think the underlying reason is that

\phi(2^m) = 2^{m-1},

hence

a^n (mod 2^m) = a^{n mod 2^{m-1}} (mod 2^m)

Cute.

My implementation constructs a 4-bit starting value as

r0 = 1 + (((n << 2) & ((a0 << 1) ^ (a0 << 2))) & 8);

(here, a0 is the low input limb, r0 is the low output limb, and the
iteration computes a^{1/n-1} mod a power of two.

That's a clever formula, but it might not be faster than a tiny table if
less than 7 operations can be used.

We should be able to get a 8-bit starting value using a table lookup on
at most 13 bits (18 KByte). But maybe it's not worth the effort; a
single iteration getting from 4 bits to 8 shouldn't be terribly
expensive.

18 kByte is too much.

BTW, for large n one ought to use n mod the right power of 2 for the
powering in the first few iterations, to avoid doing lots of useless
work in powering.

Perhaps as a comment add that to the file?

> (2) iterate single limb code before entering the mpn loop.

One should definitely have an initial single-limb loop. Similar to how
it's doen with binvert and binvert_limb.

I have become convinced that we need a mpn_broot, for example in a
mpn_rootexact.  I am not convinced that perfpow's early inversion and
use of binv_root is worse than using mpn_broot and a mullo for each k.

I suggest that we make four new files in mpn/generic:

broot.c:    mpn_broot and your mpn_xxx that computes a^{1/n-1},
perhaps call the latter mpn_brootinvm1
brootinv.c: mpn_brootinv
bsqrt.c:    mpn_bsqrt  (which probably calls mpn_bsqrtinv, mpn_mullo)
bsqrtinv.c: mpn_bsqrtinv

--
Torbjörn