Improvement of FFT

Wed Nov 29 09:31:14 CET 2006

Dear Mr Vukoslavcevic,

> I got from Mr.Niels your master thesis but I could not
> understood some parts so do you have some IEEE paper published on topic
> CRT/FFT ?

How nice of Mr Niels to send you my master's thesis. (For reference it can
be found at:
http://www.nada.kth.se/utbildning/grukth/exjobb/rapportlistor/2005/rapporter05/farnqvist_tommy_05091.pdf)
In my thesis work I mostly surveyed older IEEE papers, since they were
among the first published about the FFT. Later papers with DSP
applications in mind use the floating point FFT. Since my primary concern
was working in Z/Zp I did not read that many of those IEEE papers.
(Although some of them present ideas applicable in the NTT case as well.)
Might I suggest a search at http://ieeexplore.ieee.org/?

> Is there any improvements of using CRT/FFT methods comparing to regular
> FFT in speed or just in accuracy ?

Well, to use CRT/FFT methods you would have to leave the floating point
FFT and do the arithmetic in some ring, Z/Zm, with the right properties.
(I.e. the ordinary ring structure and operations. Also, division with the
transform length and suitable roots of unity have to be present.)
Since the modulo operation is exact, accuracy is no longer a concern.
There are no rounding errors, which make very long transform lengths
possible.
Using CRT does have an effect on speed though. That is why GMP, nowadays,
does two FFTs (working modulo a "large" Fermat number as per
Schönhage-Strassen) and CRTs them together. Mostly this speed gain comes
from the fact that the shorter transforms will fit better into the faster
memory of a typical target machine.
The other way to go is to use moduli that are "small", i.e. fits in a
single or a couple of machine words. To be able to do long transforms and
still avoid overflow, several moduli are used and the results are combined
via CRT.
Of course the speed varies with different implementations and certainly
with different architectures. (E.g. in the Z/Zp case with "small primes"
the integer multiplication throughput of the processor at hand becomes
critical.)

Best regards,
Tommy