Schönhage-Strassen

Thu Aug 26 12:10:07 CEST 2010

Paul Zimmermann <Paul.Zimmermann at loria.fr> writes:

  the memory usage is about 8n (in addition to the inputs and output). Thus it
  is 12n if you count the inputs and output. With 16GB memory you should be able
  to multiply two numbers of 1.33GB, thus about 3.2 billion digits.

I think that's slightly exaggerated with GMP 5.0.1.  I think that input
operands, the result operand, and scratch space, together need 10n+o(n)
bits of space when multiplying two n-bit operands.

  the Hybrid NTT might be faster in some ranges of n. However I'd like to see
  real timings on the same architecture to be convinced.

I believe large-integer multiplication based on FFT's that operate over
a fixed-order algebraic domain will have a complexity of the type

  O(n log n log log n log log log n ...)

while with a domain that grows with operand size can be made better,
such as

  O(n log n log log n)

of Schönhage-Strassen's mod 2^t+1 algorithm ("SSA").  (One can actually
do slightly better as demonstrated by Martin Fürer; the log log n factor
can be replaced by 2^(log*n).)

In practice, given that log log log n ... will be 1+espilon as long as
we multiply numbers thatfit in any hard disk anybody can afford, one may
very well work with a fixed-order domain.

The experience of the GMP team is that working over such a domain is
faster than SSA for operands over a certain limit, contrary to what the
assymptotics.  The reason for this is, I believe, that SSA as
implemented suffers from poor spatial locality.  It is not easy to fix
that; for huge operands a single coefficient in the domain will be too
large for the L1 cache!

-- 
Torbjörn