# hgcd1/2

Niels Möller nisse at lysator.liu.se
Tue Sep 3 13:46:36 UTC 2019

```nisse at lysator.liu.se (Niels Möller) writes:

> tg at gmplib.org (Torbjörn Granlund) writes:
>
>> So I think plain / is the way to go for certain systems!
>
> Then we should use that for the double limb loop too! It gets a bit
> tricky, since we need special handling of large quotients,

Here's a div2 function that does that. Together with the deleted div1,
it gives a 25% (!) speed up of gcd in the range 3-10 limbs, benchmarked
on my broadwell machine. And should do well on all machines with decent
small-quotient division.

/* Two-limb division optimized for small quotients.  */
static inline mp_limb_t
div2 (mp_ptr rp,
mp_limb_t n1, mp_limb_t n0,
mp_limb_t d1, mp_limb_t d0)
{
mp_limb_t t1, t0;
mp_limb_t q = n1 / d1;  /* Approximative quotient */

if (UNLIKELY (q > d1) )
{
/* Normalize */
mp_limb_t n2;
int c;
n2 = n1 >> (GMP_LIMB_BITS - c);
n1 = (n1 << c) | (n0 >> (GMP_LIMB_BITS - c));
n0 <<= c;
d1 = (d1 << c) | (d0 >> (GMP_LIMB_BITS - c));
d0 <<= c;

udiv_qrnnd (q, n1, n2, n1, d1);
umul_ppmm (t1, t0, q, d0);
if (t1 > n1 || (t1 == n1 && t0 > n1))
{
ASSERT_ALWAYS (q > 0);
q--;
n1 += d1;
sub_ddmmss (t1, t0, t1, t0, 0, d0);
}
sub_ddmmss (n1, n0, n1, n0, t1, t0);

/* Undo normalization */
n0 = (n0 >> c) | (n1 << (GMP_LIMB_BITS - c));
n1 >>= c;
}
else
{
n1 -= q * d1;
umul_ppmm (t1, t0, q, d0);
if (UNLIKELY (t1 >= n1) && (t1 > n1 || t0 > n0))
{
/* q must be exactly one off */
ASSERT_ALWAYS (q > 0);
q--;
n1 += d1;
sub_ddmmss (t1, t0, t1, t0, 0, d0);
}
sub_ddmmss (n1, n0, n1, n0, t1, t0);
}
ASSERT_ALWAYS (n1 < d1 || (n1 == d1 && n0 < d0));
rp = n0;
rp = n1;
return q;
}

Ideally, it should be rearranged a bit so that the large q path is a
function call, not inlined.

Regards,
/Niels

--
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
```