hgcd1/2

Torbjörn Granlund tg at gmplib.org
Thu Sep 5 08:33:45 UTC 2019


One of my previous proposal looks like this, when put into its right form:

mp_limb_t
div1 (mp_limb_t n0, mp_limb_t d0)
{
  if (UNLIKELY ((d0 >> 61) != 0))
    return n0 / d0;

  if (UNLIKELY (n0 >= (d0 << 3)))
    return n0 / d0;
  else {
    mp_limb_t q, mask;

    d0 <<= 2;

    mask = -(mp_limb_t) (n0 >= d0);
    q = 4 & mask;
    n0 -= d0 & mask;

    d0 >>= 1;
    mask = -(mp_limb_t) (n0 >= d0);
    q += 2 & mask;
    n0 -= d0 & mask;

    d0 >>= 1;
    mask = -(mp_limb_t) (n0 >= d0);
    q += 1 & mask;
    n0 -= d0 & mask;

    return q;
  }
}

Some timing tests indicate that it is 6 times faster on 'shell' for
quotients < 8.  For hgcd, that would mean almost all quotients.
I expect an asm version would be 10x faster.

I attached a full test-and-timing program.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: div1.c
Type: application/octet-stream
Size: 1079 bytes
Desc: not available
URL: <https://gmplib.org/list-archives/gmp-devel/attachments/20190905/152320a5/attachment.obj>
-------------- next part --------------

-- 
Torbj?rn
Please encrypt, key id 0xC8601622


More information about the gmp-devel mailing list