Saving a significant multiplication in odd order Toom-Cook negacyclic convolution

Thu Jan 2 15:43:31 UTC 2020

Ciao Brian,

sorry for the long delay, I did not read this message carefully enough
when you sent it. And I happened to reread it in the last few days...

Il Ven, 5 Luglio 2019 11:08 am, Brian Quach ha scritto:
> Currently I am just trying to produce some 'evaluation' and
> 'interpolation' matrices (not similar to Toom-Cook matrices).

> Here is an example for N = 3:
> cyclotomic factorisation: x^3 + 1 = (x + 1)(x^2 - x + 1) (number of
> factors k = 2)
> multiplications needed: 2N - k = 6 - 2 = 4 (better than the

There is a general problem about specialised Toom-N functions to compute
the product modulo 2^{K*b}+1, where I assume a "GMP-limb" (a register,
usually) has $b$ bits.
They can be applied ONLY if the number of limbs (K in the above formula)
is a multiple of N, not one more, not one less...

I mean: if you have two operands of 100 "GMP-limbs", and you need the
product modulo 2^{100*b}+1, you can always multiply them with a generic
multiplication function using Toom-3, then reduce the result modulo
2^{100*b}+1. But you can not use a specialised Toom-3 as the one you are
proposing. For the latter to be usable, you must be working modulo
2^{99*b}+1 or modulo 2^{102*b}+1...

Multiplication modulo 2^K+1 (for small K) is used basically by the
FFT-based multiplication and by mulmod_bnm1, the function that compute a
product mod (2^b)^n-1. Both are widely used in the library...

We should understand how the limitation on the sizes of those Toom-N, and
the recursive structure of the functions that would use them, can
interact...

> 'evaluation' matrix for each operand (array of rows): [[1, -1, 1], [1, 0,
> -1], [1, 1, 0], [0, 1, 1]] (can involve 5 additions / subtractions)
> This is probably better than Bodrato's optimal sequence for Toom-3, as it
> has the same addition / subtraction count and saves 2 shifts.
> 'interpolation' matrix: [[1, 1, 1, -2], [-1, -1, 2, -1], [1, -2, 1, 1]] /
> 3 (6 additions / subtractions and 1 division by 3)

Yes, using the notation I used in "Towards Optimal Toom-Cook
Multiplication [...]", for Toom-3 we need:

 Evaluation: 5*2 add/sub, 2 shift; 5mul
 Interpolation: 8 add/sub, 3 shift, 1 Sdiv
 Recomposition: 4 overlapping parts (+ 3 for reduction mod x^3+1)
 (each one costs as half an interpolation add/sub)

But the current implementation in GMP mixes interpolation and
recomposition, using some tricks suggested in the paper "High Degree
Toom'n'Half [...]" and perform the following operations.

 Interpolation: 7 add/sub, 3 shift, 1 Sdiv
 Recomposition: 5 overlap (+3 after the call)

The direct computation of Toom-3 modulo x^3+1 you are proposing, would
then need the following.

 Evaluation: 5*2 add/sub; 4mul
 Interpolation: 6 add/sub, 1 Sdiv
 Recomposition: 3 overlapping parts

(or, by mixing Int. and Rec.)
 Interpolation: 4 add/sub, 1 Sdiv
 Recomposition: 5 overlapping parts

As you suggested, one product is saved, and also the linear operations are
lighter. It should also be possible to reduce memory usage, because not
all the results of the products are needed for all the time...

But the algorithm should really be implemented, to understand how it can
interact with the current functions in the library. Unfortunately it can
not unconditionally be used for all sizes... and not even for every size
in a range as the current Toom-based implemented functions.

Ĝis,
m

-- 
http://bodrato.it/papers/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mat3.gp
Type: application/x-gnuplot
Size: 1308 bytes
Desc: not available
URL: <https://gmplib.org/list-archives/gmp-discuss/attachments/20200102/9215f3c3/attachment-0001.bin>