Tuneup using FFT thresholds

Wed Dec 2 16:56:16 CET 2009

bodrato at mail.dm.unipi.it writes:

  > For redc_n we will need a MULMOD_BNM1_TO_MUL_FFT_MODF_THRESHOLD, if we
  > implement FFT-based wrapping multiplication.  We'll use that in lots of
  > the division code too.

  For sure mullow needs the tuning you mention. But I guess you can
  #define MULMOD_BNM1_TO_MUL_FFT_MODF_THRESHOLD MP_SIZE_T_MAX /* never */

:-)

  Because mulmod_bnm1 uses mul_fft internally, and if you estimate the cost:
  mulmod_bnm1(n) = mul_fft(n/2) + mul_fft(n/4) + mul_fft(n/8) + ...
  If the cost mul_fft(x) is super-linear, the cost above is expected smaller
  than mul_fft(n), and this shouldn't depend on mul_fft implementation.

  Some testing:
  $ ./speed -s $[1<<20]-$[1<<24] -t$[1<<20] mpn_mul_fft mpn_mulmod_bnm1
  overhead 0.000000002 secs, precision 10000 units of 3.74e-10 secs, CPU
  freq 2672.69 MHz
            mpn_mul_fft mpn_mulmod_bnm1
  1048576    0.599493000  #0.556925000
  2097152    1.220427000  #1.191480000
  3145728    1.917416000  #1.793403000
  4194304    2.620275000  #2.439718000
  5242880    3.288026000  #3.158822000
  6291456    3.931765000  #3.773305000
  7340032    4.536216000  #4.301628000
  8388608    5.343622000  #5.187061000
  9437184    5.996460000  #5.766439000
  10485760    7.498659000  #6.559679000
  11534336    7.517573000  #7.077295000
  12582912    8.825550000  #7.837774000
  13631488    8.823629000  #8.498667000
  14680064   10.342665000  #8.999170000
  15728640   10.326072000  #9.329783000
  16777216   11.727617000 #10.733654000

It's awesome when theory and practive coincides.  :-)

I'll leave MULMOD_BNM1_TO_MUL_FFT_MODF_THRESHOLD out of the code.

-- 
Torbjörn