Tsz Wo Sze
szetszwo at yahoo.com
Tue Aug 31 05:48:15 CEST 2010
Hi Pual and Torbjorn,
Thanks for the replies. How did you achieve 10N+o(N)?
I understand that 12N can be done as following:
Suppose we want to compute c = a*b,
where a and b are N-bit integers
Write N = MK,
where M is the number of bits of the coefficients and K is the number of
Write K = 2^k.
Then, dimension D = 2K
and the Schönhage-Strassen modulus 2^n + 1 requires to be
(1) n >= 2M + k, and
(2) D | 2n (or D | 4n using the sqrt(2) trick. We actually won't use (2)
below. We just state it for completeness.)
Then, |dft(a)| = nD >= (2M + k)D > 4MK = 4N,
where |x| denotes the number of bits of x.
Finally, the memory requirement is
|a| + |b| + |dft(a)| + |dft(b)| + |c| > 12N.
(Sorry for the late reply. I took some time to carefully think about it.)
----- Original Message ----
> From: Torbjorn Granlund <tg at gmplib.org>
> To: Paul Zimmermann <Paul.Zimmermann at loria.fr>
> Cc: gmp-discuss at gmplib.org; Tsz Wo Sze <szetszwo at yahoo.com>
> Sent: Thu, August 26, 2010 3:10:07 AM
> Subject: Re: Schönhage-Strassen
> Paul Zimmermann <Paul.Zimmermann at loria.fr> writes:
> the memory usage is about 8n (in addition to the inputs and output). Thus
> is 12n if you count the inputs and output. With 16GB memory you should be
> to multiply two numbers of 1.33GB, thus about 3.2 billion digits.
> I think that's slightly exaggerated with GMP 5.0.1. I think that input
> operands, the result operand, and scratch space, together need 10n+o(n)
> bits of space when multiplying two n-bit operands.
> the Hybrid NTT might be faster in some ranges of n. However I'd like to see
> real timings on the same architecture to be convinced.
> I believe large-integer multiplication based on FFT's that operate over
> a fixed-order algebraic domain will have a complexity of the type
> O(n log n log log n log log log n ...)
> while with a domain that grows with operand size can be made better,
> such as
> O(n log n log log n)
> of Schönhage-Strassen's mod 2^t+1 algorithm ("SSA"). (One can actually
> do slightly better as demonstrated by Martin Fürer; the log log n factor
> can be replaced by 2^(log*n).)
> In practice, given that log log log n ... will be 1+espilon as long as
> we multiply numbers thatfit in any hard disk anybody can afford, one may
> very well work with a fixed-order domain.
> The experience of the GMP team is that working over such a domain is
> faster than SSA for operands over a certain limit, contrary to what the
> assymptotics. The reason for this is, I believe, that SSA as
> implemented suffers from poor spatial locality. It is not easy to fix
> that; for huge operands a single coefficient in the domain will be too
> large for the L1 cache!
> gmp-discuss mailing list
> gmp-discuss at gmplib.org
More information about the gmp-discuss