# gcdext_1 questions

Niels Möller nisse at lysator.liu.se
Fri Dec 4 11:50:13 CET 2009

```Paul Zimmermann <Paul.Zimmermann at loria.fr> writes:

> Thus one iteration is equivalent to Stein's binary algorithm, except you
> replace the subtraction by an addmul_si (and doing that you remove more bits).

A slightly different way to look at it is as follows.
We have u and v odd, and update

u <-- (u + q v) / 2^{j+number of extra trailing zeros}

where q = - u v^{-1} mod 2^j (represented in some symmetric or
assymmetric range).

In the binary algorithm as described in Stehlé's and your paper, j is
determined by the number of "extra trailing bits" from the previous
iteration. But from a practical point of view, the important thing is
that u and q v are of the same bitsize, so that u + q*v is not much
larger than u, growing at most by a carry bit.

So I think one could do a little better by using count_leading_zeros, to
take into account precisely what happens to the bits also in the most
significant end (instead of just relying on the growth there being
bounded as a fibonacci sequence), we may by luck have a bit or two
cancelling there too.

Like,

while (u != v)
{
if (u >v)
{
j = #u - #v + 1;  // I let # denote bitsize
q = -u v^{-1} mod 2^j;
u += q v;
u >>= (trailing zeros, >= j);
}
else { ... Analogous update v += q * v ... }
}

where there then are several options for how to handle signs of u, v and
q.

To try this (I'm still considering small inputs), maybe one should start
with plain gcd, rather than gcdext. The current mpn_gcd_1 uses the
following loop, with no branches but the looping itself:

ulimb >>= 1;
vlimb >>= 1;

while (ulimb != vlimb)
{
int c;
mp_limb_t t = ulimb - vlimb;

/* v <-- min (u, v) */
vlimb += (vgtu & t);

/* u <-- |u - v| */
ulimb = (t ^ vgtu) - vgtu;

count_trailing_zeros (c, t);
ulimb >>= (c + 1);
}

The most expensive operation is the count_trailing_zeros (but it can be
scheduled to start very early, since it depends on t only), and then a
dependency chain of, if I count correctly, 5 basic operations.

Not sure what the various asembler imlementations do. At least some seem
to prefer table lookup to count_trailing_zeros.

Regards,
/Niels

--
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
```