Choosing multiplication algorithm
Torbjorn Granlund
tg at gmplib.org
Mon Sep 28 09:13:12 CEST 2009
"Alberto Zanoni" <zanoni at volterra.uniroma2.it> writes:
I saw the multiplication diagrams of Torbjorn, which clearly show how
difficult the question "fastest algorithm" is to answer to.
Indeed. But the diagrams also suggest that some significant
improvements wrt the current GMP strategies should not be too hard to
implement.
Torbjorn indicates "...4 nested strategies...". As currently I cannot
see the developing code of mul.c (Marco pointed it out to me, but I'm
at a conference, and the browser here doesn't let me access it), I
have to ask if an idea I had some weeks ago is the same as Torbjorn.
Multiplication for _very_unbalanced operands:
I use polynomial notation, hoping to be clear: consider a(x), with
degree d, and b(x) = b1*x + b0. The coefficients of the product c(x)
are:
c0 = a0*b0
c1 = a0*b1 + a1*b0
c2 = a1*b1 + a2*b0
c3 = a2*b1 + a3*b0
....
c(d) = a(d-1)*b1 + ad*b0
c(d+1) = ad*b1
The idea is to "iterate" a toom_X2 method. I write the example using
what Marco and I call Toom-2.5, that is toom_32:
Yes, this is what my "nested" strategies do. (I suppose "nested" was a
poor choice of word, "partial" says more.)
Both GMP ad my diagram generation software do some such slicing.
It may happen that the last polynomial Ak(x) has degree 0 or 1
(depending on the value of d%3). In the first case, just two other
multiplications are needed (ad*b0 and ad*b1): in the second, one can
apply Karatsuba to save one final multiplication. As b(x) has degree1,
the "recomposition" of all these products" making c(x) is quite
natural.
One shouldn't insist on using this partial multiply strategy until the
very end of the operand. As soon the the remaining slice is not
"rectangular enough" (or rectangular in the wrong dimension) one should
simply recurse to generic multiplication.
The diagrams tells us one important thing: The last slice might be
greater than perhaps expected. For example, there seem to be bases
where a partial toom42 should recurse to toom44 or even FFT for its last
slice.
A comment: as b(x) is the same for all Ai(x), one could evaluate it
(in the necessary points) just once, avoiding to repeat the same
evaluations at each multiplication Ai(x)b(x).
Yes, and this is something we cannot do currently.
--
Torbjörn
More information about the gmp-devel
mailing list