NTT multiplication.

Aleksey Bader bader at
Wed May 13 12:02:10 CEST 2009

Paul and Torbjörn,

Thanks a lot for your quick answers.

Here is some key details of my implementation (maybe it would be
1. I used 3 prime numbers of the form l*2^k+1 that fit 32-bit word.
Unfortunately I had only 32-bit machine for tests.
2. To get memory locality I also used Cooley-Tukey FFT with precomputed
tables of root powers.
3. Burret reduction was used for fast modular multiplication (Tommy compared
Montgomery and Victor Shoup 's reductions).

To be honest, I didn't use GMP with its assembly-optimized basic operations
So I wanted to try the combination of my tricks with those you have to check
if it could be useful.

Thanks again for you suggestions.

On Wed, May 13, 2009 at 12:35 PM, Torbjorn Granlund <tg at> wrote:

> Aleksey Bader <bader at> writes:
>  As follows from one of the projects is
>  devoted to implementation of FFT-based algorithm combined with Chinese
>  Reminder Theorem. It's claimed that there are 2 implementations of this
>  algorithm (by Tommy Färnqvist and Niels Möller).
>  Q1. Are there available source code for those implementations?
> I don't know.  I think both are still work-in-progress (although the
> progress might be limited for some periods).  You might want to ask the
> authors.
>  The reason I'm asking is because I did the same job when I was working on
> my
>  master thesis 2 years ago. It'd be very curious to compare or maybe
> improve
>  our implementations.
>  Actually, I was able to find Tommy's master thesis and his code seemed to
> be
>  up to twice as fast as the code in GMP 4.1.4.
>   Q2. What about 4.3.*? Could it still be profitable in comparison with the
>  current GMP multiplication?
> IIRC, his code can still beat the Schönhage-Strassen code currently in
> GMP, but only for very large operands.  This is due to the memory
> locality tricks Tommy are using (due to Cooley and Tukey, rediscovered
> by Bailey).
>  P.S. Relatively recently, Martin Furer
>  published<>new
>   multiplication algorithm with very low asymptotic complexity.
>  Q3. Any known (tries of) implementions (in GMP O:)? Maybe someone think
>  about it (disregard to author's claim that his method outperforms
>  Schönhage-Strassen on "astonomically large numbers" :-) )?
> The complexity of Fürer's algorithm is only slightly better than that of
> Schönhage-Strassen's algorithm.  (A log(log(n)) factor is replaced by a
> 2^{log*(n)} factor.  While a better upper bound is a nice step forward
> in theoretical terms, this small improvement would probably be hard to
> translate into actual performance.
> --
> Torbjörn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the gmp-devel mailing list