New thresholds in table

Torbjorn Granlund tg at gmplib.org
Sun Nov 13 16:07:29 CET 2011


nisse at lysator.liu.se (Niels Möller) writes:

  Speaking of thresholds: It seems HGCD_THRESHOLD is pretty low now, in
  the range 50-150 limbs for most listed machines. And HGCD_APPR_THRESHOLD
  a bit smaller still, on several machines. I remember HGCD_THRESHOLD as
  typically being around 200, but I may be wrong.
  
The easiest way to check your memory might be to try the gmp-5.0
repository.  :-)

(I plan to make the autobuild reporting mechanism retain historical
data, keeping them on a compressed zfs partition.)

  Now, hgcd_appr seems to be some 10% faster than hgcd in the range from a
  few dozen limbs up to 1000. I had expected the threshold for hgcd_reduce
  to be below 1000 limbs.
  
  The hgcd_reduce threshold compares:
  
   1. hgcd(n/2) + 4 mul(n/2, n/4)
  
   2. hgcd_appr(n/2) + 4 mulmod_bnm1(3n/4, n/4)
  
  If, for low sizes, one tries mullo instead of mpn_mulmod_bnm1, then one
  would get instead
  
   2. hgcd_appr(n/2) + 4 mul(n/2, n/4) + 4 mullo(n/4)
  
  so if hgcd_appr saves 10%, then method 2 wins when 4 mullo(n/4) < 0.1
  hgcd(n/2). Also, it seems that in this range one mul(n) is roughly 10%
  of hgcd, so method 2 should win when 4 mullo(n/4) < mul(n/2). Which
  makes me think it could make sense in school book range and very early
  Karatsuba/Mulders range only, which isn't very interesting.
  
  So I wonder if it's possible to get the threshold down. Improvements to
  hgcd_appr would help, naturally. Using mulmid for the first recursive
  call, where we have cancellation *and* can discard low limbs, might give
  some speedup for small sizes.
  
Have you made a sanity check plot of the various higher level functions
after having compiled-in new thresholds?  Any bumps might indicate a
spot for improvement.

-- 
Torbjörn


More information about the gmp-devel mailing list