fixed transform length FFT
Paul.Zimmermann at loria.fr
Wed Oct 21 16:28:21 CEST 2009
I planned to implement a fixed transform length FFT. It took me a few
days to have a first running version for Schönhage-Strassen's algorithm (SSA).
The code is automatically generated from a master program which takes as
input k for a transform length 2^k. You can see an example for k=6 at
Some timings on a 2.83Ghz Core 2:
tarte% ./fft6 2048 10000
mul took 8250ms # mpn_mul_n + wraparound
mpn_mul_fft6 took 4322ms (ratio 0.523879) # fixed length 2^6 FFT SSA
tarte% ./speed -s 2048 mpn_mul_n mpn_mul_fft.6
overhead 0.000000002 secs, precision 10000 units of 3.53e-10 secs, CPU freq 2833.00 MHz
2048 0.000812073 #0.000465751
This means that mpn_mul_fft6 gives a 7% speedup over the generic code from
I expect that speedup can be increased if I remove the current constraint
that the input limb size n is a multiple of K^2/2 (which ensures all shifts
are limb shifts in SSA). But the master program will be more tricky to write.
That code can also give an upper bound for the different Toom variants.
More information about the gmp-devel