fixed transform length FFT

Fri Oct 23 16:05:00 CEST 2009

Hi Paul!

>> > That code can also give an upper bound for the different Toom
>> variants.

>> But fft6 is not computing the full product of the two operands... Should
>> I
>> compare fft6 2048x2048 -> 2048+1 with toom-N 1025x1025 -> 2050?
>
> exactly (more precisely 1024x1024 -> 2048, since the +1 is only one bit).

Ok, then the cut-off edge is quite high...

$ for i in $(seq 1 20); do echo $[1024<<i]; ./fft6.exe $[1024<<i]
$[1000/i]; done
2048
mul took 1979ms
mpn_mul_fft6 took 1024ms (ratio 0.517433)
mul44n/2 took 720ms
mul66n/2 took 660ms
mul88n/2 took 620ms
...
262144
mul took 87781ms
mpn_mul_fft6 took 98757ms (ratio 1.125038)
mul44n/2 took 92784ms
mul66n/2 took 53843ms
mul88n/2 took 51020ms

... this means that, with n=262144
fft6 nxn->n is faster than toom4 n/2xn/2->n but slower than toom6 nxn->n
(and mul_n is probably using some fft-K with larger K)

When we'll be able to write a Toom-2^5'n'half... than compare that 64
points Toom with FFT-6 and decide if your code really is an upper bound or
not :-P
Just kidding :-D

Regards,
Marco

PS: gcc spent more than one minute to compile fft6 with -O3 ! (Yes I have
a slow computer)

-- 
http://bodrato.it/papers/