fixed transform length FFT

Fri Oct 23 16:41:18 CEST 2009

       Marco,

> >> > That code can also give an upper bound for the different Toom
> >> variants.
> 
> >> But fft6 is not computing the full product of the two operands... Should
> >> I
> >> compare fft6 2048x2048 -> 2048+1 with toom-N 1025x1025 -> 2050?
> >
> > exactly (more precisely 1024x1024 -> 2048, since the +1 is only one bit).
> 
> Ok, then the cut-off edge is quite high...
> 
> $ for i in $(seq 1 20); do echo $[1024<<i]; ./fft6.exe $[1024<<i]
> $[1000/i]; done
> 2048
> mul took 1979ms
> mpn_mul_fft6 took 1024ms (ratio 0.517433)
> mul44n/2 took 720ms
> mul66n/2 took 660ms
> mul88n/2 took 620ms
> ...
> 262144
> mul took 87781ms
> mpn_mul_fft6 took 98757ms (ratio 1.125038)
> mul44n/2 took 92784ms
> mul66n/2 took 53843ms
> mul88n/2 took 51020ms
> 
> ... this means that, with n=262144
> fft6 nxn->n is faster than toom4 n/2xn/2->n but slower than toom6 nxn->n
> (and mul_n is probably using some fft-K with larger K)

yes, my experiment with fft6 was mainly for use in the recursive calls of SSA,
where is has to be compared to mul nxn->2n. But for that I have to get rid of
the constraint that n is divisible by 2^11.

> PS: gcc spent more than one minute to compile fft6 with -O3 ! (Yes I have
> a slow computer)

I noticed that too. A good exercise for compilers...

Paul