Choosing multiplication algorithm

Sun Sep 27 17:39:31 CEST 2009

Hi,
     I saw the multiplication diagrams of Torbjorn, which clearly show how difficult the question "fastest algorithm" is to answer to.

I have a question and a comment:

Torbjorn indicates "...4 nested strategies...". As currently I cannot
see the developing code of mul.c (Marco pointed it out to me, but I'm
at a conference, and the browser here doesn't let me access it), I
have to ask if an idea I had some weeks ago is the same as Torbjorn.

Multiplication for _very_unbalanced operands:

I use polynomial notation, hoping to be clear: consider a(x), with
degree d, and b(x) = b1*x + b0. The coefficients of the product c(x)
are:

c0 = a0*b0
c1 = a0*b1 + a1*b0
c2 = a1*b1 + a2*b0
c3 = a2*b1 + a3*b0
....
c(d) = a(d-1)*b1 + ad*b0
c(d+1) = ad*b1

The idea is to "iterate" a toom_X2 method. I write the example using
what Marco and I call Toom-2.5, that is toom_32:

- "Split" a(x) coefficients into triples, obtaining in practice
  polynomial peaces:

[a0 a1 a2] => A0(x) = a2*x^2 + a1*x + a0
[a3 a4 a5] => A1(x) = a5*x^2 + a4*x + a3
...

and use toom32 to compute all the products

Ai(x)*b(x)  i = 0,...

it may happen that the last polynomial Ak(x) has degree 0 or 1
(depending on the value of d%3). In the first case, just two other
multiplications are needed (ad*b0 and ad*b1): in the second, one can
apply Karatsuba to save one final multiplication. As b(x) has degree1,
the "recomposition" of all these products" making c(x) is quite
natural.

By using toom_42, split instead into quadruples, etc. For the last
polynomials, either do two multiplications, or use Karatsuba, or
toom_32, if the last A has degree less than 3, etc.

A question: Torbjorn, is this what you did and/or thinking of ?

A comment: as b(x) is the same for all Ai(x), one could evaluate it
(in the necessary points) just once, avoiding to repeat the same
evaluations at each multiplication Ai(x)b(x).

I'm currently trying to what happens when applying these techniques to
the power computation (x^n), but I've still not finished the implementation
(by the way, it's a mpz-level implementation, so if someone could be
interested in writing a mpn one, I'd be glad to receive help).

Bye,

Alberto