fixed transform length FFT
Paul Zimmermann
Paul.Zimmermann at loria.fr
Wed Oct 21 16:28:21 CEST 2009
Hi,
in http://gmplib.org/list-archives/gmp-devel/2009-October/001047.html
I planned to implement a fixed transform length FFT. It took me a few
days to have a first running version for Schönhage-Strassen's algorithm (SSA).
The code is automatically generated from a master program which takes as
input k for a transform length 2^k. You can see an example for k=6 at
http://www.loria.fr/~zimmerma/software/fft6.c.
Some timings on a 2.83Ghz Core 2:
tarte% ./fft6 2048 10000
Test 0
mul took 8250ms # mpn_mul_n + wraparound
mpn_mul_fft6 took 4322ms (ratio 0.523879) # fixed length 2^6 FFT SSA
tarte% ./speed -s 2048 mpn_mul_n mpn_mul_fft.6
overhead 0.000000002 secs, precision 10000 units of 3.53e-10 secs, CPU freq 2833.00 MHz
mpn_mul_n mpn_mul_fft.6
2048 0.000812073 #0.000465751
This means that mpn_mul_fft6 gives a 7% speedup over the generic code from
mpn_mul_fft.6.
I expect that speedup can be increased if I remove the current constraint
that the input limb size n is a multiple of K^2/2 (which ensures all shifts
are limb shifts in SSA). But the master program will be more tricky to write.
That code can also give an upper bound for the different Toom variants.
Paul
More information about the gmp-devel
mailing list