Wraparound multiplicaton vs mullo

Niels Möller nisse at lysator.liu.se
Thu Oct 22 15:55:57 CEST 2009


Torbjorn Granlund <tg at gmplib.org> writes:

> Wow!  Which are the largest n for which it is still faster than full
> multiplication?

Haven't tried, but I'd not be surprised if it's a little faster also
in the FFT range. As Marco said, since multiplication is superlinear,
therefore, CRT using a known factorization of the modulo should always
help. For example,

  (B^{8n} - 1) = (B^n-1) (B^n+1) (B^{2n}+1) (B^{4n}+1)

Full multiplication would use an FFT of size 16n, cost

  16 n (4 + log n)

Doing it with FFT wraparound at the toplevel would use FFT of size 8n,
cost

  8 n (3 + log n)

Doing wraparound FFT multiplication mod one factor at a time would be
two FFT:s of size n, one of 2n and one of 4n,

  2 n log n + 2 n (1 + log n) + 4 n (2 + log n)
  = 8 n (3/4 + log n)

How it really works out in practice will depend on how the linear
saving in more shallow FFT:s (a factor 4 in the linear term above)
compares to the cost of the wrapping of inputs and the CRT
reconstructions.

/Niels


More information about the gmp-devel mailing list