gcdext_1 questions

Niels Möller nisse at lysator.liu.se
Fri Dec 4 11:50:13 CET 2009


Paul Zimmermann <Paul.Zimmermann at loria.fr> writes:

> Thus one iteration is equivalent to Stein's binary algorithm, except you
> replace the subtraction by an addmul_si (and doing that you remove more bits).

A slightly different way to look at it is as follows.
We have u and v odd, and update

  u <-- (u + q v) / 2^{j+number of extra trailing zeros}

where q = - u v^{-1} mod 2^j (represented in some symmetric or
assymmetric range).

In the binary algorithm as described in Stehlé's and your paper, j is
determined by the number of "extra trailing bits" from the previous
iteration. But from a practical point of view, the important thing is
that u and q v are of the same bitsize, so that u + q*v is not much
larger than u, growing at most by a carry bit.

So I think one could do a little better by using count_leading_zeros, to
take into account precisely what happens to the bits also in the most
significant end (instead of just relying on the growth there being
bounded as a fibonacci sequence), we may by luck have a bit or two
cancelling there too.

Like,

  while (u != v)
    {
      if (u >v)
        {
          j = #u - #v + 1;  // I let # denote bitsize
          q = -u v^{-1} mod 2^j;
          u += q v;
          u >>= (trailing zeros, >= j);
        }
      else { ... Analogous update v += q * v ... }
    }

where there then are several options for how to handle signs of u, v and
q.

To try this (I'm still considering small inputs), maybe one should start
with plain gcd, rather than gcdext. The current mpn_gcd_1 uses the
following loop, with no branches but the looping itself:

  ulimb >>= 1;
  vlimb >>= 1;

  while (ulimb != vlimb)
    {
      int c;
      mp_limb_t t = ulimb - vlimb;
      mp_limb_t vgtu = LIMB_HIGHBIT_TO_MASK (t);

      /* v <-- min (u, v) */
      vlimb += (vgtu & t);

      /* u <-- |u - v| */
      ulimb = (t ^ vgtu) - vgtu;

      count_trailing_zeros (c, t);
      ulimb >>= (c + 1);
    }

The most expensive operation is the count_trailing_zeros (but it can be
scheduled to start very early, since it depends on t only), and then a
dependency chain of, if I count correctly, 5 basic operations.

Not sure what the various asembler imlementations do. At least some seem
to prefer table lookup to count_trailing_zeros.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.


More information about the gmp-devel mailing list