New thresholds in table

Sun Nov 13 16:07:29 CET 2011

nisse at lysator.liu.se (Niels Möller) writes:

  Speaking of thresholds: It seems HGCD_THRESHOLD is pretty low now, in
  the range 50-150 limbs for most listed machines. And HGCD_APPR_THRESHOLD
  a bit smaller still, on several machines. I remember HGCD_THRESHOLD as
  typically being around 200, but I may be wrong.

The easiest way to check your memory might be to try the gmp-5.0
repository.  :-)

(I plan to make the autobuild reporting mechanism retain historical
data, keeping them on a compressed zfs partition.)

  Now, hgcd_appr seems to be some 10% faster than hgcd in the range from a
  few dozen limbs up to 1000. I had expected the threshold for hgcd_reduce
  to be below 1000 limbs.

  The hgcd_reduce threshold compares:

   1. hgcd(n/2) + 4 mul(n/2, n/4)

   2. hgcd_appr(n/2) + 4 mulmod_bnm1(3n/4, n/4)

  If, for low sizes, one tries mullo instead of mpn_mulmod_bnm1, then one
  would get instead

   2. hgcd_appr(n/2) + 4 mul(n/2, n/4) + 4 mullo(n/4)

  so if hgcd_appr saves 10%, then method 2 wins when 4 mullo(n/4) < 0.1
  hgcd(n/2). Also, it seems that in this range one mul(n) is roughly 10%
  of hgcd, so method 2 should win when 4 mullo(n/4) < mul(n/2). Which
  makes me think it could make sense in school book range and very early
  Karatsuba/Mulders range only, which isn't very interesting.

  So I wonder if it's possible to get the threshold down. Improvements to
  hgcd_appr would help, naturally. Using mulmid for the first recursive
  call, where we have cancellation *and* can discard low limbs, might give
  some speedup for small sizes.

Have you made a sanity check plot of the various higher level functions
after having compiled-in new thresholds?  Any bumps might indicate a
spot for improvement.

-- 
Torbjörn