# gcd_22

Niels Möller nisse at lysator.liu.se
Tue Aug 20 06:55:05 UTC 2019

```tg at gmplib.org (Torbjörn Granlund) writes:

> I think we might get more speed from this algorithm:
>
>   while (highlimb(U) | highlimb(V)) != 0
>     S = U - V
>     T = V - U
>     cnt = count_trailing_zeros(lowlimb(S))
>     cnd1 = U < V
>     cnd2 = highlimb(U) < highlimb(V)
>     if (cnd2)
>       V = U
>       U = T
>     else
>       U = S
>     if (cnd1 != cnd2) { fix mistake! }
>     U >>= cnt;
>
> Why would it be faster?  Because while cnd1 needs to wait for the
> previous iteration's double-limb shifting, cnd2 depends on just the
> single-limb shifting of the high U world.

Interesting. I did something related in the u±v gcd_22 variant, for
other reasons. We probably don't need any fixup for the V assignment, I
would expect that

V <-- min(U,V)

can be replaced with

V <-- (highlimb(U) < highlimb(V)) ? U : V

with no measurable impact on performance. Since it will make a
difference for at most one iteration, and we still make reasonable
progress when it happens.

--- a/mpn/generic/gcd_22.c Sat Aug 17 00:59:04 2019 +0200
+++ b/mpn/generic/gcd_22.c Tue Aug 20 07:51:25 2019 +0200
@@ -52,7 +53,7 @@ mpn_gcd_22 (mp_limb_t u1, mp_limb_t u0,
{
mp_limb_t vgtu, t1, t0;
sub_ddmmss (t1, t0, u1, u0, v1, v0);
+      vgtu = LIMB_HIGHBIT_TO_MASK(u1 - v1);

if (UNLIKELY (t0 == 0))
{
@@ -91,6 +92,9 @@ mpn_gcd_22 (mp_limb_t u1, mp_limb_t u0,
since t0 != 0. */
u0 = (t0 ^ vgtu) - vgtu;
u1 = t1 ^ vgtu;
+   if (UNLIKELY ((mp_limb_signed_t) u1 < 0))
+     {
+       u0 = ~u0;
+       u1 = -u1;
+     }
if (UNLIKELY (c == GMP_LIMB_BITS))
{
u0 = u1;

I take it the idea is to have an unlikely branch for this case;
attempting a branchless correction of vgtu would miss the point?

Regards,
/Niels

--
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
```