Tuneup using FFT thresholds

Sun Nov 29 23:43:06 CET 2009

Hi,

> For redc_n we will need a MULMOD_BNM1_TO_MUL_FFT_MODF_THRESHOLD, if we
> implement FFT-based wrapping multiplication.  We'll use that in lots of
> the division code too.

For sure mullow needs the tuning you mention. But I guess you can
#define MULMOD_BNM1_TO_MUL_FFT_MODF_THRESHOLD MP_SIZE_T_MAX /* never */

Because mulmod_bnm1 uses mul_fft internally, and if you estimate the cost:
mulmod_bnm1(n) = mul_fft(n/2) + mul_fft(n/4) + mul_fft(n/8) + ...
If the cost mul_fft(x) is super-linear, the cost above is expected smaller
than mul_fft(n), and this shouldn't depend on mul_fft implementation.

Some testing:
$ ./speed -s $[1<<20]-$[1<<24] -t$[1<<20] mpn_mul_fft mpn_mulmod_bnm1
overhead 0.000000002 secs, precision 10000 units of 3.74e-10 secs, CPU
freq 2672.69 MHz
          mpn_mul_fft mpn_mulmod_bnm1
1048576    0.599493000  #0.556925000
2097152    1.220427000  #1.191480000
3145728    1.917416000  #1.793403000
4194304    2.620275000  #2.439718000
5242880    3.288026000  #3.158822000
6291456    3.931765000  #3.773305000
7340032    4.536216000  #4.301628000
8388608    5.343622000  #5.187061000
9437184    5.996460000  #5.766439000
10485760    7.498659000  #6.559679000
11534336    7.517573000  #7.077295000
12582912    8.825550000  #7.837774000
13631488    8.823629000  #8.498667000
14680064   10.342665000  #8.999170000
15728640   10.326072000  #9.329783000
16777216   11.727617000 #10.733654000

Regards,
Marco

-- 
http://bodrato.it/