2-adic roots (Re: bdiv vs redc)

Tue Jul 24 18:51:10 CEST 2012

nisse at lysator.liu.se (Niels Möller) writes:

  I see. I think the "right" way to compute the iteration

    x <-- x - x (a x^2 - 1) / 2

  is as follows. Let the current x have \ell bits.

  1. Compute the square x^2. Use wraparound, since the lowest \ell bits were
     computed in the same step in the previous iteration.

  2. Compute a x^2. Use wraparound, since the low \ell bits are 0, ...,0,
     1. Shift right, and ignore the low zero limbs. Denote the result as

      B^z e = (a x^2 - 1) / 2

  3. Compute x e (mod 2^{2\ell} / B^z]). This is balanced (almost) plain
     mullo.

Or wraparound, or mulmid?  But I suppose mulmid might be for unbalanced
things?

  4. Subtract x -= B^z x e, extending the precision of x from \ell bits to
     2\ell - 2. We could simplify the final subtraction (which is mostly a
     negation) if we negate a up front, really using a newton iteration
     converging to (-a)^{-1/2} (mod increasing powers of two).

We should btw really try plain sqrt (i.e., using Euclidian norm) doing
a^{1/2} via a^{-1/2}.  It is cleaner, and should be faster.

  > I suppose one should for common k improve the starting value from 1 bit
  > to a few bits, and for any k iterate in mp_limb_t variables until
  > getting a full word of precision (using a fully unrolled loop).

  For square root I think the most practical is to tabulate square roots
  mod 2^10 (using a table with only 2^7 entries), and then iterate 10 ->
  18 -> 34 -> 66. I don't think it is of much use to use a larger table
  unless we go up to 2^14 or 2^15 entries.

It is not hard to agree.

I suppose he next thing is to make sure your code is right, and then se
when it beats the code in perfpow.c.

I think we should add sqrt and root Hensel code along the same lines.

-- 
Torbjörn