hgcd1/2
Torbjörn Granlund
tg at gmplib.org
Thu Sep 5 08:33:45 UTC 2019
One of my previous proposal looks like this, when put into its right form:
mp_limb_t
div1 (mp_limb_t n0, mp_limb_t d0)
{
if (UNLIKELY ((d0 >> 61) != 0))
return n0 / d0;
if (UNLIKELY (n0 >= (d0 << 3)))
return n0 / d0;
else {
mp_limb_t q, mask;
d0 <<= 2;
mask = -(mp_limb_t) (n0 >= d0);
q = 4 & mask;
n0 -= d0 & mask;
d0 >>= 1;
mask = -(mp_limb_t) (n0 >= d0);
q += 2 & mask;
n0 -= d0 & mask;
d0 >>= 1;
mask = -(mp_limb_t) (n0 >= d0);
q += 1 & mask;
n0 -= d0 & mask;
return q;
}
}
Some timing tests indicate that it is 6 times faster on 'shell' for
quotients < 8. For hgcd, that would mean almost all quotients.
I expect an asm version would be 10x faster.
I attached a full test-and-timing program.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: div1.c
Type: application/octet-stream
Size: 1079 bytes
Desc: not available
URL: <https://gmplib.org/list-archives/gmp-devel/attachments/20190905/152320a5/attachment.obj>
-------------- next part --------------
--
Torbj?rn
Please encrypt, key id 0xC8601622
More information about the gmp-devel
mailing list