Re: Schönhage-Strassen

Tsz Wo Sze szetszwo at
Tue Aug 31 05:48:15 CEST 2010

Hi Pual and Torbjorn,

Thanks for the replies.  How did you achieve 10N+o(N)?

I understand that 12N can be done as following:

Suppose we want to compute c = a*b,
where a and b are N-bit integers
Write N = MK,
where M is the number of bits of the coefficients and K is the number of 
Write K = 2^k.

Then, dimension D = 2K
and the Schönhage-Strassen modulus 2^n + 1 requires to be

(1) n >= 2M + k, and
(2) D | 2n (or D | 4n using the sqrt(2) trick.  We actually won't use (2) 
below.  We just state it for completeness.)

Then, |dft(a)| = nD >= (2M + k)D > 4MK = 4N,
where |x| denotes the number of bits of x.

Finally, the memory requirement is
|a| + |b| + |dft(a)| + |dft(b)| + |c| > 12N.

(Sorry for the late reply.  I took some time to carefully think about it.)


----- Original Message ----
> From: Torbjorn Granlund <tg at>
> To: Paul Zimmermann <Paul.Zimmermann at>
> Cc: gmp-discuss at; Tsz Wo Sze <szetszwo at>
> Sent: Thu, August 26, 2010 3:10:07 AM
> Subject: Re: Schönhage-Strassen
> Paul Zimmermann <Paul.Zimmermann at>  writes:
>   the memory usage is about 8n (in addition to the inputs  and output). Thus 
>   is 12n if you count the inputs and output. With  16GB memory you should be 
>   to multiply two numbers of 1.33GB, thus  about 3.2 billion digits.
> I think that's slightly exaggerated with  GMP 5.0.1.  I think that input
> operands, the result operand, and scratch  space, together need 10n+o(n)
> bits of space when multiplying two n-bit  operands.
>   the Hybrid NTT might be faster in some ranges of n.  However I'd like to see
>   real timings on the same architecture to be  convinced.
> I believe large-integer multiplication based on FFT's  that operate over
> a fixed-order algebraic domain will have a complexity of  the type
>   O(n log n log log n log log log n ...)
> while with a domain that grows with operand size can be made better,
> such  as
>   O(n log n log log n)
> of Schönhage-Strassen's mod 2^t+1  algorithm ("SSA").  (One can actually
> do slightly better as demonstrated  by Martin Fürer; the log log n factor
> can be replaced by  2^(log*n).)
> In practice, given that log log log n ... will be 1+espilon  as long as
> we multiply numbers thatfit in any hard disk anybody can afford,  one may
> very well work with a fixed-order domain.
> The experience of  the GMP team is that working over such a domain is
> faster than SSA for  operands over a certain limit, contrary to what the
> assymptotics.  The  reason for this is, I believe, that SSA as
> implemented suffers from poor  spatial locality.  It is not easy to fix
> that; for huge operands a  single coefficient in the domain will be too
> large for the L1  cache!
> -- 
> Torbjörn
> _______________________________________________
> gmp-discuss  mailing list
> gmp-discuss at

More information about the gmp-discuss mailing list