n-prime fft multiplication, for n = 1, 2, 3
Josh Liu
zliu2 at student.gsu.edu
Sun Jun 6 04:39:21 CEST 2004
I have implemented the Number Theoretic Transform using Montgomery
reduction. It uses Bailey's splitting method to make the algorithm more
cache friendly. It by far significantly faster than the plain old
Number Theoretic Transform without Bailey's splitting. I used Bailey's
6-step method with in-order results. Fortunately, the matrix
transpositions do not take up too much time. The bottleneck is the
Montgomery multiplication and reduction operation. That takes at least
46% of the running time of the algorithm. I hope that can be of help to
you. I too would like to see a Number Theoretic Transform that is
competitive with the Sch\"onhage-Strassen method. Oh, by the way, my
Montgomery multiplication and reduction function was designed to be
branch free. I made the sacrifice of adding more instructions to make
it branch-free. I haven't done any comparisons between the running time
of the Bailey 6-step Number Theoretic Transform and the
Sch\"onhage-Strassen but subjectively the latter seems to be not much
faster than the former. I have code available at my web site at the
link at the bottom. I'm sorry that the code is not easily compilable.
On second thought, look at it at your own peril :o)
With regards, Josh Liu
The relevant files are under
centrinia/design/base/N/medium/arithmetic/multiplication/NTT/
http://www.student.gsu.edu/~zliu2
More information about the gmp-discuss
mailing list