Small operands gcd improvements
Niels Möller
nisse at lysator.liu.se
Wed Jul 31 07:14:15 UTC 2019
tg at gmplib.org (Torbjörn Granlund) writes:
> Perhaps gcd_22 should (conditionally) invoke (or tail call) gcd_11,
> possibly it should let its caller invoke gcd_11 to finish up an
> unfinished job.
We should run the gcd_22 main loop as long as all of u1, u0, v1, v0 are
non-zero. After computing {t1, t0} <-- {v1,v0} - {u1,u0}, there are a
couple of cases:
1. t1 == t0 == 0: Terminate. Will happen if and only of the gcd is larger
than a single limb.
2. t0 == 0. Enter gcd_21 (se below), after some setup to shift out
trailing zeros from |v1 - u1|.
3. Common case: Get absolute value and shift out trailing zeros. May
produce a zero high limb, in that case fall through to gcd_21,
otherwise, continue gcd_22 loop.
(We could check for an early zero high limb between steps (2) and (3),
i.e., check for (t1 == 0 and no underflow or t1 == ~0 and underflow from
the subtraction). But most likely not worth the effort, since these
cases are also taken care of by the high limb check in (3).)
The gcd_21 loop takes inputs (u1, u0, v), inputs odd, and all the limbs
non-zero. It is a much simpler loop than gcd_22
do
{
{u1, u0} <-- {u1, u0} - v (always > 0)
if (u0 == 0) { u0 = u1; break; }
shift out trailing zeros
}
while (u1 > 0)
Not sure how many rounds it will typically run, but I'd guess typically
few. When the gcd_21 loop exits, we're set for a call to gcd_11.
> What should its input requirements be? Should it require one argument,
> either argument, or both argument to be odd? If even inputs are allowed,
> do we also allow zero input?
>
> To me it makes some sense to require that common factors of two are
> taken care of at top-level, and at least rule out the case of two even
> inputs to gcd_11.
>
> Agreed.
Let's say that gcd(u,v) requires an odd u. What are the options for v?
If we allow arbitrary v, then it could in principle be implemented as
the tail recursive
mp_limb_t
gcd_11 (mp_limb_t u, mp_limb_t v)
{
if (v == 0) return u;
count_leading_zeros(cnt, v);
v >>= count;
return gcd_11(min(u,v), |u-v|);
}
Or we say odd u, non-zero v, the check for zero is moved around, and
we'll check for zero only in connection with the u-v subtraction,
mp_limb_t
gcd_11 (mp_limb_t u, mp_limb_t v)
{
count_leading_zeros(cnt, v);
v >>= count;
if (u == v) return u;
return gcd_11(min(u,v), |u-v|);
}
If we also require v odd, then we'll just move the clz and shift a bit,
with no really clear benefits.
Pro: Avoids initial clz for uses where both inputs are known odd
apriori. May be slightly more convenient for the C gcd_11 with masking
tricks where initial clz and theone in the loop are a bit different.
Cons: Moves the responsibility an initial clz out of the supposedly well
optimized asm gcd_11 and out to C code. Call chain down from mpz_gcd
with small inputs is particularly important.
So I see no strong reason to do it one way or the other. Maybe start
with requiring both u and v odd, and relax requirements later in case we
find any more tangible benefit from doing that?
Regards,
/Niels
--
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel
mailing list