Tuneup using FFT thresholds
Torbjorn Granlund
tg at gmplib.org
Wed Dec 2 16:56:16 CET 2009
bodrato at mail.dm.unipi.it writes:
> For redc_n we will need a MULMOD_BNM1_TO_MUL_FFT_MODF_THRESHOLD, if we
> implement FFT-based wrapping multiplication. We'll use that in lots of
> the division code too.
For sure mullow needs the tuning you mention. But I guess you can
#define MULMOD_BNM1_TO_MUL_FFT_MODF_THRESHOLD MP_SIZE_T_MAX /* never */
:-)
Because mulmod_bnm1 uses mul_fft internally, and if you estimate the cost:
mulmod_bnm1(n) = mul_fft(n/2) + mul_fft(n/4) + mul_fft(n/8) + ...
If the cost mul_fft(x) is super-linear, the cost above is expected smaller
than mul_fft(n), and this shouldn't depend on mul_fft implementation.
Some testing:
$ ./speed -s $[1<<20]-$[1<<24] -t$[1<<20] mpn_mul_fft mpn_mulmod_bnm1
overhead 0.000000002 secs, precision 10000 units of 3.74e-10 secs, CPU
freq 2672.69 MHz
mpn_mul_fft mpn_mulmod_bnm1
1048576 0.599493000 #0.556925000
2097152 1.220427000 #1.191480000
3145728 1.917416000 #1.793403000
4194304 2.620275000 #2.439718000
5242880 3.288026000 #3.158822000
6291456 3.931765000 #3.773305000
7340032 4.536216000 #4.301628000
8388608 5.343622000 #5.187061000
9437184 5.996460000 #5.766439000
10485760 7.498659000 #6.559679000
11534336 7.517573000 #7.077295000
12582912 8.825550000 #7.837774000
13631488 8.823629000 #8.498667000
14680064 10.342665000 #8.999170000
15728640 10.326072000 #9.329783000
16777216 11.727617000 #10.733654000
It's awesome when theory and practive coincides. :-)
I'll leave MULMOD_BNM1_TO_MUL_FFT_MODF_THRESHOLD out of the code.
--
Torbjörn
More information about the gmp-devel
mailing list