# Small operands gcd improvements

Niels Möller nisse at lysator.liu.se
Wed Jul 31 07:14:15 UTC 2019

```tg at gmplib.org (Torbjörn Granlund) writes:

> Perhaps gcd_22 should (conditionally) invoke (or tail call) gcd_11,
> possibly it should let its caller invoke gcd_11 to finish up an
> unfinished job.

We should run the gcd_22 main loop as long as all of u1, u0, v1, v0 are
non-zero. After computing {t1, t0} <-- {v1,v0} - {u1,u0}, there are a
couple of cases:

1. t1 == t0 == 0: Terminate. Will happen if and only of the gcd is larger
than a single limb.

2. t0 == 0. Enter gcd_21 (se below), after some setup to shift out
trailing zeros from |v1 - u1|.

3. Common case: Get absolute value and shift out trailing zeros. May
produce a zero high limb, in that case fall through to gcd_21,
otherwise, continue gcd_22 loop.

(We could check for an early zero high limb between steps (2) and (3),
i.e., check for (t1 == 0 and no underflow or t1 == ~0 and underflow from
the subtraction). But most likely not worth the effort, since these
cases are also taken care of by the high limb check in (3).)

The gcd_21 loop takes inputs (u1, u0, v), inputs odd, and all the limbs
non-zero. It is a much simpler loop than gcd_22

do
{
{u1, u0} <-- {u1, u0} - v   (always > 0)
if (u0 == 0) { u0 = u1; break; }
shift out trailing zeros
}
while (u1 > 0)

Not sure how many rounds it will typically run, but I'd guess typically
few. When the gcd_21 loop exits, we're set for a call to gcd_11.

>   What should its input requirements be? Should it require one argument,
>   either argument, or both argument to be odd? If even inputs are allowed,
>   do we also allow zero input?
>
>   To me it makes some sense to require that common factors of two are
>   taken care of at top-level, and at least rule out the case of two even
>   inputs to gcd_11.
>
> Agreed.

Let's say that gcd(u,v) requires an odd u. What are the options for v?
If we allow arbitrary v, then it could in principle be implemented as
the tail recursive

mp_limb_t
gcd_11 (mp_limb_t u, mp_limb_t v)
{
if (v == 0) return u;
v >>= count;
return gcd_11(min(u,v), |u-v|);
}

Or we say odd u, non-zero v, the check for zero is moved around, and
we'll check for zero only in connection with the u-v subtraction,

mp_limb_t
gcd_11 (mp_limb_t u, mp_limb_t v)
{
v >>= count;
if (u == v) return u;
return gcd_11(min(u,v), |u-v|);
}

If we also require v odd, then we'll just move the clz and shift a bit,
with no really clear benefits.

Pro: Avoids initial clz for uses where both inputs are known odd
apriori. May be slightly more convenient for the C gcd_11 with masking
tricks where initial clz and theone in the loop are a bit different.

Cons: Moves the responsibility an initial clz out of the supposedly well
optimized asm gcd_11 and out to C code. Call chain down from mpz_gcd
with small inputs is particularly important.

So I see no strong reason to do it one way or the other. Maybe start
with requiring both u and v odd, and relax requirements later in case we
find any more tangible benefit from doing that?

Regards,
/Niels

--
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
```