hgcd1/2

Niels Möller nisse at lysator.liu.se
Tue Sep 3 13:46:36 UTC 2019


nisse at lysator.liu.se (Niels Möller) writes:

> tg at gmplib.org (Torbjörn Granlund) writes:
>
>> So I think plain / is the way to go for certain systems!
>
> Then we should use that for the double limb loop too! It gets a bit
> tricky, since we need special handling of large quotients,

Here's a div2 function that does that. Together with the deleted div1,
it gives a 25% (!) speed up of gcd in the range 3-10 limbs, benchmarked
on my broadwell machine. And should do well on all machines with decent
small-quotient division.

/* Two-limb division optimized for small quotients.  */
static inline mp_limb_t
div2 (mp_ptr rp,
      mp_limb_t n1, mp_limb_t n0,
      mp_limb_t d1, mp_limb_t d0)
{
  mp_limb_t t1, t0;
  mp_limb_t q = n1 / d1;  /* Approximative quotient */

  if (UNLIKELY (q > d1) )
    {
      /* Normalize */
      mp_limb_t n2;
      int c;
      count_leading_zeros (c, d1);
      n2 = n1 >> (GMP_LIMB_BITS - c);
      n1 = (n1 << c) | (n0 >> (GMP_LIMB_BITS - c));
      n0 <<= c;
      d1 = (d1 << c) | (d0 >> (GMP_LIMB_BITS - c));
      d0 <<= c;

      udiv_qrnnd (q, n1, n2, n1, d1);
      umul_ppmm (t1, t0, q, d0);
      if (t1 > n1 || (t1 == n1 && t0 > n1))
        {
          ASSERT_ALWAYS (q > 0);
          q--;
          n1 += d1;
          sub_ddmmss (t1, t0, t1, t0, 0, d0);
        }
      sub_ddmmss (n1, n0, n1, n0, t1, t0);

      /* Undo normalization */
      n0 = (n0 >> c) | (n1 << (GMP_LIMB_BITS - c));
      n1 >>= c;
    }
  else
    {
      n1 -= q * d1;
      umul_ppmm (t1, t0, q, d0);
      if (UNLIKELY (t1 >= n1) && (t1 > n1 || t0 > n0))
        {
          /* q must be exactly one off */
          ASSERT_ALWAYS (q > 0);
          q--;
          n1 += d1;
          sub_ddmmss (t1, t0, t1, t0, 0, d0);
        }
      sub_ddmmss (n1, n0, n1, n0, t1, t0);
    }
  ASSERT_ALWAYS (n1 < d1 || (n1 == d1 && n0 < d0));
  rp[0] = n0;
  rp[1] = n1;
  return q;
}

Ideally, it should be rearranged a bit so that the large q path is a
function call, not inlined.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.


More information about the gmp-devel mailing list