hgcd1/2
Torbjörn Granlund
tg at gmplib.org
Sun Sep 8 13:08:06 UTC 2019
> Should we rename div1 to put it into the GMP name space? How about
> mpn_div_11? (We could encode in its name that it is for use for small
> q, but I'd suggest that we don't do that.)
>
> That would allow for some assembly experiments.
If you think assembly will make a big difference (which seems likely),
that makes sense. And define it only when it's faster than a a div
instruction generated by the compiler?
I believe METHOD 1 could almost always be pointless if METHOD 3's
straight line code is made really efficient.
That should in particular be true if the q = 1 code is removed from
hgcd2, for machines with good multiplication throughput.
$ ./tune/speed -c -p1000000 -s1 mpn_hgcd2_1 mpn_hgcd2_2 mpn_hgcd2_3
overhead 6.00 cycles, precision 1000000 units of 8.33e-10 secs, CPU freq 1200.00 MHz
mpn_hgcd2_1 mpn_hgcd2_2 mpn_hgcd2_3
1 1492.62 2064.18 #1375.18
So this is 50% speedup over the old method.
Not bad! :-)
Comments on the patch and old code:
The normalisation code only works if operands are not already
normalised. Cannot they be that?
Should be revive the commented-out div2? It looks good to me, the
comment about that it is slower than the branchy div2 is probably from
the era where branchy div1/div2 was great. (I suspect that to be when
Vax and m68020 was hot.)
I think there might be a slight bug or inefficiency here (from hgcd2.c):
ASSERT (ah >= bh);
ah -= bh;
if (ah < (CNST_LIMB (1) << (GMP_LIMB_BITS / 2 + 1)))
break;
if (ah <= bh)
{
/* Use q = 1 */
u01 += u00;
u11 += u10;
}
else
{
mp_double_limb_t rq = div1 (ah, bh);
If, in the the last 'if' above ah = bh, we have as a matter of fact q =
2 (with zero remainder). If I read the code right that will be
doscovered one iteration later.
If seem (ah < bh) is the correct criterion.
--
Torbjörn
Please encrypt, key id 0xC8601622
More information about the gmp-devel
mailing list