fixed transform length FFT
bodrato at mail.dm.unipi.it
bodrato at mail.dm.unipi.it
Fri Oct 23 16:05:00 CEST 2009
Hi Paul!
>> > That code can also give an upper bound for the different Toom
>> variants.
>> But fft6 is not computing the full product of the two operands... Should
>> I
>> compare fft6 2048x2048 -> 2048+1 with toom-N 1025x1025 -> 2050?
>
> exactly (more precisely 1024x1024 -> 2048, since the +1 is only one bit).
Ok, then the cut-off edge is quite high...
$ for i in $(seq 1 20); do echo $[1024<<i]; ./fft6.exe $[1024<<i]
$[1000/i]; done
2048
mul took 1979ms
mpn_mul_fft6 took 1024ms (ratio 0.517433)
mul44n/2 took 720ms
mul66n/2 took 660ms
mul88n/2 took 620ms
...
262144
mul took 87781ms
mpn_mul_fft6 took 98757ms (ratio 1.125038)
mul44n/2 took 92784ms
mul66n/2 took 53843ms
mul88n/2 took 51020ms
... this means that, with n=262144
fft6 nxn->n is faster than toom4 n/2xn/2->n but slower than toom6 nxn->n
(and mul_n is probably using some fft-K with larger K)
When we'll be able to write a Toom-2^5'n'half... than compare that 64
points Toom with FFT-6 and decide if your code really is an upper bound or
not :-P
Just kidding :-D
Regards,
Marco
PS: gcc spent more than one minute to compile fft6 with -O3 ! (Yes I have
a slow computer)
--
http://bodrato.it/papers/
More information about the gmp-devel
mailing list