gcd_22
Torbjörn Granlund
tg at gmplib.org
Mon Aug 19 22:32:35 UTC 2019
We now have the algorithm
while (highlimb(U) | highlimb(V)) != 0
S = U - V
T = V - U
cnt = count_trailing_zeros(lowlimb(S))
cnd1 = U < V
if (cnd1)
V = U
U = T
else
U = S
U >>= cnt
with some variation. The "if" statement is performed with masking
tricks or conditional move/conditional select.
The main performance problem with that code is the dependency on
lowlimb(U) in the conditional assignment, through the double-limb shift,
to the S and T assignments. The looping criteria is a smaller problem,
as branch prediction takes care of that.
I think we might get more speed from this algorithm:
while (highlimb(U) | highlimb(V)) != 0
S = U - V
T = V - U
cnt = count_trailing_zeros(lowlimb(S))
cnd1 = U < V
cnd2 = highlimb(U) < highlimb(V)
if (cnd2)
V = U
U = T
else
U = S
if (cnd1 != cnd2) { fix mistake! }
U >>= cnt;
Why would it be faster? Because while cnd1 needs to wait for the
previous iteration's double-limb shifting, cnd2 depends on just the
single-limb shifting of the high U world.
We might make the wrong decision iff highlimb(U) = highlimb(V) in
which case cnd1 might be true (cnd2 is false of course).
But when highlimb(U) = highlimb(V) then either highlimb(U-V) = 0 or
highlimb(V-U) = 0, so we're about to go into a gcd_21 situation. (We
might also have highlimb(U-V) = highlimb(V-U) = 0, in which case we're
done completely.)
With some inspiration of Richard Sites, Alpha architect, I say: It's
the critical path, stupid! :-)
--
Torbjörn
Please encrypt, key id 0xC8601622
More information about the gmp-devel
mailing list