Re: Schönhage-Strassen
Paul Zimmermann
Paul.Zimmermann at loria.fr
Tue Aug 31 10:58:21 CEST 2010
Dear Tsz Wo Sze,
> Date: Mon, 30 Aug 2010 20:48:15 -0700 (PDT)
> From: Tsz Wo Sze <szetszwo at yahoo.com>
>
> Hi Pual and Torbjorn,
>
> Thanks for the replies. How did you achieve 10N+o(N)?
I don't know how Torbjörn gets 10N, but I agree with your calculation:
* we need two buffers of N for the inputs : total 2N
* we need a buffer of 2N for the output : total 2N
* we need two buffers of 4N each for the Fourier transforms : total 8N
(we reuse one buffer for the inverse transform)
Paul
> I understand that 12N can be done as following:
>
> Suppose we want to compute c = a*b,
> where a and b are N-bit integers
> Write N = MK,
> where M is the number of bits of the coefficients and K is the number of
> coefficients.
> Write K = 2^k.
>
> Then, dimension D = 2K
> and the Schönhage-Strassen modulus 2^n + 1 requires to be
>
> (1) n >= 2M + k, and
> (2) D | 2n (or D | 4n using the sqrt(2) trick. We actually won't use (2)
> below. We just state it for completeness.)
>
> Then, |dft(a)| = nD >= (2M + k)D > 4MK = 4N,
> where |x| denotes the number of bits of x.
>
> Finally, the memory requirement is
> |a| + |b| + |dft(a)| + |dft(b)| + |c| > 12N.
>
> (Sorry for the late reply. I took some time to carefully think about it.)
>
> Regards,
> TW
>
>
>
> ----- Original Message ----
> > From: Torbjorn Granlund <tg at gmplib.org>
> > To: Paul Zimmermann <Paul.Zimmermann at loria.fr>
> > Cc: gmp-discuss at gmplib.org; Tsz Wo Sze <szetszwo at yahoo.com>
> > Sent: Thu, August 26, 2010 3:10:07 AM
> > Subject: Re: Schönhage-Strassen
> >
> > Paul Zimmermann <Paul.Zimmermann at loria.fr> writes:
> >
> > the memory usage is about 8n (in addition to the inputs and output). Thus
> it
> > is 12n if you count the inputs and output. With 16GB memory you should be
> >able
> > to multiply two numbers of 1.33GB, thus about 3.2 billion digits.
> >
> > I think that's slightly exaggerated with GMP 5.0.1. I think that input
> > operands, the result operand, and scratch space, together need 10n+o(n)
> > bits of space when multiplying two n-bit operands.
> >
> > the Hybrid NTT might be faster in some ranges of n. However I'd like to see
> > real timings on the same architecture to be convinced.
> >
> > I believe large-integer multiplication based on FFT's that operate over
> > a fixed-order algebraic domain will have a complexity of the type
> >
> > O(n log n log log n log log log n ...)
> >
> > while with a domain that grows with operand size can be made better,
> > such as
> >
> > O(n log n log log n)
> >
> > of Schönhage-Strassen's mod 2^t+1 algorithm ("SSA"). (One can actually
> > do slightly better as demonstrated by Martin Fürer; the log log n factor
> > can be replaced by 2^(log*n).)
> >
> > In practice, given that log log log n ... will be 1+espilon as long as
> > we multiply numbers thatfit in any hard disk anybody can afford, one may
> > very well work with a fixed-order domain.
> >
> > The experience of the GMP team is that working over such a domain is
> > faster than SSA for operands over a certain limit, contrary to what the
> > assymptotics. The reason for this is, I believe, that SSA as
> > implemented suffers from poor spatial locality. It is not easy to fix
> > that; for huge operands a single coefficient in the domain will be too
> > large for the L1 cache!
> >
> > --
> > Torbjörn
> > _______________________________________________
> > gmp-discuss mailing list
> > gmp-discuss at gmplib.org
> > https://gmplib.org/mailman/listinfo/gmp-discuss
> >
>
> _______________________________________________
> gmp-discuss mailing list
> gmp-discuss at gmplib.org
> https://gmplib.org/mailman/listinfo/gmp-discuss
>
More information about the gmp-discuss
mailing list