New thresholds in table
tg at gmplib.org
Sun Nov 13 16:07:29 CET 2011
nisse at lysator.liu.se (Niels Möller) writes:
Speaking of thresholds: It seems HGCD_THRESHOLD is pretty low now, in
the range 50-150 limbs for most listed machines. And HGCD_APPR_THRESHOLD
a bit smaller still, on several machines. I remember HGCD_THRESHOLD as
typically being around 200, but I may be wrong.
The easiest way to check your memory might be to try the gmp-5.0
(I plan to make the autobuild reporting mechanism retain historical
data, keeping them on a compressed zfs partition.)
Now, hgcd_appr seems to be some 10% faster than hgcd in the range from a
few dozen limbs up to 1000. I had expected the threshold for hgcd_reduce
to be below 1000 limbs.
The hgcd_reduce threshold compares:
1. hgcd(n/2) + 4 mul(n/2, n/4)
2. hgcd_appr(n/2) + 4 mulmod_bnm1(3n/4, n/4)
If, for low sizes, one tries mullo instead of mpn_mulmod_bnm1, then one
would get instead
2. hgcd_appr(n/2) + 4 mul(n/2, n/4) + 4 mullo(n/4)
so if hgcd_appr saves 10%, then method 2 wins when 4 mullo(n/4) < 0.1
hgcd(n/2). Also, it seems that in this range one mul(n) is roughly 10%
of hgcd, so method 2 should win when 4 mullo(n/4) < mul(n/2). Which
makes me think it could make sense in school book range and very early
Karatsuba/Mulders range only, which isn't very interesting.
So I wonder if it's possible to get the threshold down. Improvements to
hgcd_appr would help, naturally. Using mulmid for the first recursive
call, where we have cancellation *and* can discard low limbs, might give
some speedup for small sizes.
Have you made a sanity check plot of the various higher level functions
after having compiled-in new thresholds? Any bumps might indicate a
spot for improvement.
More information about the gmp-devel