gcdext_1 questions
Niels Möller
nisse at lysator.liu.se
Fri Dec 4 11:50:13 CET 2009
Paul Zimmermann <Paul.Zimmermann at loria.fr> writes:
> Thus one iteration is equivalent to Stein's binary algorithm, except you
> replace the subtraction by an addmul_si (and doing that you remove more bits).
A slightly different way to look at it is as follows.
We have u and v odd, and update
u <-- (u + q v) / 2^{j+number of extra trailing zeros}
where q = - u v^{-1} mod 2^j (represented in some symmetric or
assymmetric range).
In the binary algorithm as described in Stehlé's and your paper, j is
determined by the number of "extra trailing bits" from the previous
iteration. But from a practical point of view, the important thing is
that u and q v are of the same bitsize, so that u + q*v is not much
larger than u, growing at most by a carry bit.
So I think one could do a little better by using count_leading_zeros, to
take into account precisely what happens to the bits also in the most
significant end (instead of just relying on the growth there being
bounded as a fibonacci sequence), we may by luck have a bit or two
cancelling there too.
Like,
while (u != v)
{
if (u >v)
{
j = #u - #v + 1; // I let # denote bitsize
q = -u v^{-1} mod 2^j;
u += q v;
u >>= (trailing zeros, >= j);
}
else { ... Analogous update v += q * v ... }
}
where there then are several options for how to handle signs of u, v and
q.
To try this (I'm still considering small inputs), maybe one should start
with plain gcd, rather than gcdext. The current mpn_gcd_1 uses the
following loop, with no branches but the looping itself:
ulimb >>= 1;
vlimb >>= 1;
while (ulimb != vlimb)
{
int c;
mp_limb_t t = ulimb - vlimb;
mp_limb_t vgtu = LIMB_HIGHBIT_TO_MASK (t);
/* v <-- min (u, v) */
vlimb += (vgtu & t);
/* u <-- |u - v| */
ulimb = (t ^ vgtu) - vgtu;
count_trailing_zeros (c, t);
ulimb >>= (c + 1);
}
The most expensive operation is the count_trailing_zeros (but it can be
scheduled to start very early, since it depends on t only), and then a
dependency chain of, if I count correctly, 5 basic operations.
Not sure what the various asembler imlementations do. At least some seem
to prefer table lookup to count_trailing_zeros.
Regards,
/Niels
--
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel
mailing list