n-prime fft multiplication, for n = 1, 2, 3  

Josh Liu zliu2 at student.gsu.edu
Sun Jun 6 04:39:21 CEST 2004

I have implemented the Number Theoretic Transform using Montgomery 
reduction. It uses Bailey's splitting method to make the algorithm more 
cache friendly. It by far significantly faster than the plain old 
Number Theoretic Transform without Bailey's splitting. I used Bailey's 
6-step method with in-order results. Fortunately, the matrix 
transpositions do not take up too much time. The bottleneck is the 
Montgomery multiplication and reduction operation. That takes at least 
46% of the running time of the algorithm. I hope that can be of help to 
you. I too would like to see a Number Theoretic Transform that is 
competitive with the Sch\"onhage-Strassen method. Oh, by the way, my 
Montgomery multiplication and reduction function was designed to be 
branch free. I made the sacrifice of adding more instructions to make 
it branch-free. I haven't done any comparisons between the running time 
of the Bailey 6-step Number Theoretic Transform and the 
Sch\"onhage-Strassen but subjectively the latter seems to be not much 
faster than the former. I have code available at my web site at the 
link at the bottom. I'm sorry that the code is not easily compilable. 
On second thought, look at it at your own peril :o)

With regards, Josh Liu

The relevant files are under 

More information about the gmp-discuss mailing list