A perfect power, and then?

Sat Oct 27 17:00:20 CEST 2012

nisse at lysator.liu.se (Niels Möller) writes:

  a^{1/k}, it should work fine to use a table indexed by low bits of a and
  *low bits only* of k.

  I think the underlying reason is that

    \phi(2^m) = 2^{m-1},

  hence

    a^n (mod 2^m) = a^{n mod 2^{m-1}} (mod 2^m)

Cute.

  My implementation constructs a 4-bit starting value as

    r0 = 1 + (((n << 2) & ((a0 << 1) ^ (a0 << 2))) & 8);

  (here, a0 is the low input limb, r0 is the low output limb, and the
  iteration computes a^{1/n-1} mod a power of two.

That's a clever formula, but it might not be faster than a tiny table if
less than 7 operations can be used.

  We should be able to get a 8-bit starting value using a table lookup on
  at most 13 bits (18 KByte). But maybe it's not worth the effort; a
  single iteration getting from 4 bits to 8 shouldn't be terribly
  expensive.

18 kByte is too much.

  BTW, for large n one ought to use n mod the right power of 2 for the
  powering in the first few iterations, to avoid doing lots of useless
  work in powering.

Perhaps as a comment add that to the file?

  > (2) iterate single limb code before entering the mpn loop.

  One should definitely have an initial single-limb loop. Similar to how
  it's doen with binvert and binvert_limb.

I have become convinced that we need a mpn_broot, for example in a
mpn_rootexact.  I am not convinced that perfpow's early inversion and
use of binv_root is worse than using mpn_broot and a mullo for each k.

I suggest that we make four new files in mpn/generic:

broot.c:    mpn_broot and your mpn_xxx that computes a^{1/n-1},
            perhaps call the latter mpn_brootinvm1
brootinv.c: mpn_brootinv
bsqrt.c:    mpn_bsqrt  (which probably calls mpn_bsqrtinv, mpn_mullo)
bsqrtinv.c: mpn_bsqrtinv

-- 
Torbjörn