For project "Faster multiplication": faster Toom-3, proposed Toom-4
Marco Bodrato
bodrato at mail.dm.unipi.it
Thu Oct 26 11:49:16 CEST 2006
Dear Developers,
We have just published a preprint about optimal Toom-Cook interpolation.
The most interesting result for you should be a faster Toom 3-way
multiplication. The interpolation used by GMP requires 8 sums, 1
division and 4 shift, the new proposed one requires 1 shift
less. Detailed description follows.
We could also say that the new proposed method, for 3-way
interpolation, is optimal, because we found it by a (pruned)
exhaustive search. Anyway we should doubly check our software and our
model before claiming a proof.
We hope this information will be useful to you, and we would be honoured
if you will take the time to glance to our preprint (you can find it in
http://bodrato.it/papers/#CIVV2006 ).
We will be glad to answer any clarification request of yours.
Best regards,
Marco Bodrato and Alberto Zanoni
------------------------------------------------------------
>From comments in gmp-4.2.1/mpn/generic/mul_n.c, function mpn_toom3_mul_n:
/* Implemented algorithm is the following:
[...]
* 2. v0 <- a0*b0
* v1 <- (a0+a1+a2)*(b0+b1+b2)
* v2 <- (a0+2*a1+4*a2)*(b0+2*b1+4*b2)
* vm1 <- (a0-a1+a2)*(b0-b1+b2)
* vinf <- a2*b2
* -- Inversion starts here! --
* t1 <- ((2*vm1+v2)/3+v0)/2-2*vinf
* t2 <- (v1+vm1)/2
* 3. result is c0+c1*t+c2*t^2+c3*t^3+c4*t^4 where
* c0 <- v0
* c1 <- v1 - t1
* c2 <- t2 - v0 - vinf
* c3 <- t1 - t2
* c4 <- vinf
This requires:
8(sum/diff) + 1(divby3) + 2(divby2) + 2 (mulby2)
Our proposed algorithm starts from the same evaluation
v0, v1, vm1, v2, vinf, then inverts by using the following steps:
* 2i. Inversion: every result is positive, all divisions are exact
* t2 <- (v2 - vm1)/3 %% v2 -= vm1; v2 /= 3;
* tm1 <- (v1 - vm1)/2 %% vm1 *= -1 ; vm1 += v1; vm1>>= 1;
* t1 <- (v1 - v0) %% v1 -= v0 ;
* t2 <- (t2 - t1)/2 %% v2 -= v1 ; v2 >>= 1;
* t1 <- (t1 - tm1 - vinf) %% v1 -= vm1; v1 -= vinf;
* t2 <- (t2 - 2*vinf) %% v2 -= 2*vinf;
* tm1 <- (tm1 - t2) %% vm1 -= v2;
* 3. result is c0+c1*t+c2*t^2+c3*t^3+c4*t^4 where
* c0 <- v0
* c1 <- tm1
* c2 <- t1
* c3 <- t2
* c4 <- vinf
New algorithm requires:
8(sum/diff) + 1(divby3) + 2(divby2) + 1 (mulby2)
Difference is 1 shift. Moreover the algorithm can be easily
implemented without any need for temporary variables, defining t2=v2,
t1=v1, tm1=vm1.
More information about the gmp-devel
mailing list