# Re: Schönhage-Strassen

Paul Zimmermann Paul.Zimmermann at loria.fr
Tue Aug 31 10:58:21 CEST 2010

```       Dear Tsz Wo Sze,

> Date: Mon, 30 Aug 2010 20:48:15 -0700 (PDT)
> From: Tsz Wo Sze <szetszwo at yahoo.com>
>
> Hi Pual and Torbjorn,
>
> Thanks for the replies.  How did you achieve 10N+o(N)?

I don't know how Torbjörn gets 10N, but I agree with your calculation:

* we need two buffers of N for the inputs : total 2N
* we need a buffer of 2N for the output : total 2N
* we need two buffers of 4N each for the Fourier transforms : total 8N
(we reuse one buffer for the inverse transform)

Paul

> I understand that 12N can be done as following:
>
> Suppose we want to compute c = a*b,
> where a and b are N-bit integers
> Write N = MK,
> where M is the number of bits of the coefficients and K is the number of
> coefficients.
> Write K = 2^k.
>
> Then, dimension D = 2K
> and the Schönhage-Strassen modulus 2^n + 1 requires to be
>
> (1) n >= 2M + k, and
> (2) D | 2n (or D | 4n using the sqrt(2) trick.  We actually won't use (2)
> below.  We just state it for completeness.)
>
> Then, |dft(a)| = nD >= (2M + k)D > 4MK = 4N,
> where |x| denotes the number of bits of x.
>
> Finally, the memory requirement is
> |a| + |b| + |dft(a)| + |dft(b)| + |c| > 12N.
>
> (Sorry for the late reply.  I took some time to carefully think about it.)
>
> Regards,
> TW
>
>
>
> ----- Original Message ----
> > From: Torbjorn Granlund <tg at gmplib.org>
> > To: Paul Zimmermann <Paul.Zimmermann at loria.fr>
> > Cc: gmp-discuss at gmplib.org; Tsz Wo Sze <szetszwo at yahoo.com>
> > Sent: Thu, August 26, 2010 3:10:07 AM
> > Subject: Re: Schönhage-Strassen
> >
> > Paul Zimmermann <Paul.Zimmermann at loria.fr>  writes:
> >
> >   the memory usage is about 8n (in addition to the inputs  and output). Thus
> it
> >   is 12n if you count the inputs and output. With  16GB memory you should be
> >able
> >   to multiply two numbers of 1.33GB, thus  about 3.2 billion digits.
> >
> > I think that's slightly exaggerated with  GMP 5.0.1.  I think that input
> > operands, the result operand, and scratch  space, together need 10n+o(n)
> > bits of space when multiplying two n-bit  operands.
> >
> >   the Hybrid NTT might be faster in some ranges of n.  However I'd like to see
> >   real timings on the same architecture to be  convinced.
> >
> > I believe large-integer multiplication based on FFT's  that operate over
> > a fixed-order algebraic domain will have a complexity of  the type
> >
> >   O(n log n log log n log log log n ...)
> >
> > while with a domain that grows with operand size can be made better,
> > such  as
> >
> >   O(n log n log log n)
> >
> > of Schönhage-Strassen's mod 2^t+1  algorithm ("SSA").  (One can actually
> > do slightly better as demonstrated  by Martin Fürer; the log log n factor
> > can be replaced by  2^(log*n).)
> >
> > In practice, given that log log log n ... will be 1+espilon  as long as
> > we multiply numbers thatfit in any hard disk anybody can afford,  one may
> > very well work with a fixed-order domain.
> >
> > The experience of  the GMP team is that working over such a domain is
> > faster than SSA for  operands over a certain limit, contrary to what the
> > assymptotics.  The  reason for this is, I believe, that SSA as
> > implemented suffers from poor  spatial locality.  It is not easy to fix
> > that; for huge operands a  single coefficient in the domain will be too
> > large for the L1  cache!
> >
> > --
> > Torbjörn
> > _______________________________________________
> > gmp-discuss  mailing list
> > gmp-discuss at gmplib.org
> > https://gmplib.org/mailman/listinfo/gmp-discuss
> >
>
> _______________________________________________
> gmp-discuss mailing list
> gmp-discuss at gmplib.org
> https://gmplib.org/mailman/listinfo/gmp-discuss
>
```