# gcd_22

Torbjörn Granlund tg at gmplib.org
Mon Aug 19 22:32:35 UTC 2019

```We now have the algorithm

while (highlimb(U) | highlimb(V)) != 0
S = U - V
T = V - U
cnt = count_trailing_zeros(lowlimb(S))
cnd1 = U < V
if (cnd1)
V = U
U = T
else
U = S
U >>= cnt

with some variation.  The "if" statement is performed with masking
tricks or conditional move/conditional select.

The main performance problem with that code is the dependency on
lowlimb(U) in the conditional assignment, through the double-limb shift,
to the S and T assignments.  The looping criteria is a smaller problem,
as branch prediction takes care of that.

I think we might get more speed from this algorithm:

while (highlimb(U) | highlimb(V)) != 0
S = U - V
T = V - U
cnt = count_trailing_zeros(lowlimb(S))
cnd1 = U < V
cnd2 = highlimb(U) < highlimb(V)
if (cnd2)
V = U
U = T
else
U = S
if (cnd1 != cnd2) { fix mistake! }
U >>= cnt;

Why would it be faster?  Because while cnd1 needs to wait for the
previous iteration's double-limb shifting, cnd2 depends on just the
single-limb shifting of the high U world.

We might make the wrong decision iff highlimb(U) = highlimb(V) in
which case cnd1 might be true (cnd2 is false of course).

But when highlimb(U) = highlimb(V) then either highlimb(U-V) = 0 or
highlimb(V-U) = 0, so we're about to go into a gcd_21 situation.  (We
might also have highlimb(U-V) = highlimb(V-U) = 0, in which case we're
done completely.)

With some inspiration of Richard Sites, Alpha architect, I say: It's
the critical path, stupid!  :-)

--
Torbjörn