faster toom3
Ondrej Bilka
neleai at seznam.cz
Tue Sep 26 14:54:19 CEST 2006
I discovered better substitutions for Toom3.
1.
v0=(x2-x1+x0)*(y2-y1+y0)
v1=(x2+x1+x0)*(y2+y1+y0)/*foo2+foo0 from v0*/
v2=(2*x2+x1+x0)*(2*y2+y1+y0)/*foo2+foo1+foo0+foo2 from v1*/
2.Then we compute z as:
z0=x0*y0
z4=x2*y2
z3=v2-v1-3*z4
z1=(v1-v0)/2-z3
z2=(v1+v0)/2-z0-z4
from mul_n.c Now you use this.
And my algorithm saves division, and addition at 2. phase and at 1. needs 8
additions istead 12.
/*
The algorithm is the following:
0. k = ceil(n/3), r = n - 2k, B = 2^(GMP_NUMB_BITS), t = B^k
1. split a and b in three parts each a0, a1, a2 and b0, b1, b2
with a0, a1, b0, b1 of k limbs, and a2, b2 of r limbs
2. v0 <- a0*b0
v1 <- (a0+a1+a2)*(b0+b1+b2)
v2 <- (a0+2*a1+4*a2)*(b0+2*b1+4*b2)
vm1 <- (a0-a1+a2)*(b0-b1+b2)
vinf <- a2*b2
t1 <- (3*v0+2*vm1+v2)/6-2*vinf
t2 <- (v1+vm1)/2
3. result is c0+c1*t+c2*t^2+c3*t^3+c4*t^4 where
c0 <- v0
c1 <- v1 - t1
c2 <- t2 - v0 - vinf
c3 <- t1 - t2
c4 <- vinf
*/
More information about the gmp-devel
mailing list