faster toom3

Ondrej Bilka neleai at seznam.cz
Tue Sep 26 14:54:19 CEST 2006


I discovered better substitutions for Toom3. 
1.

v0=(x2-x1+x0)*(y2-y1+y0)
v1=(x2+x1+x0)*(y2+y1+y0)/*foo2+foo0 from v0*/
v2=(2*x2+x1+x0)*(2*y2+y1+y0)/*foo2+foo1+foo0+foo2 from v1*/
2.Then we compute z as:
z0=x0*y0
z4=x2*y2
z3=v2-v1-3*z4
z1=(v1-v0)/2-z3
z2=(v1+v0)/2-z0-z4

from mul_n.c Now you use this.
And my algorithm saves division, and addition at 2. phase and at 1. needs 8
additions istead 12.

 /*
  The algorithm is the following:

  0. k = ceil(n/3), r = n - 2k, B = 2^(GMP_NUMB_BITS), t = B^k
  1. split a and b in three parts each a0, a1, a2 and b0, b1, b2
     with a0, a1, b0, b1 of k limbs, and a2, b2 of r limbs
  2. v0   <- a0*b0
     v1   <- (a0+a1+a2)*(b0+b1+b2)
     v2   <- (a0+2*a1+4*a2)*(b0+2*b1+4*b2)
     vm1  <- (a0-a1+a2)*(b0-b1+b2)
     vinf <- a2*b2
     t1   <- (3*v0+2*vm1+v2)/6-2*vinf
     t2   <- (v1+vm1)/2
  3. result is c0+c1*t+c2*t^2+c3*t^3+c4*t^4 where
     c0   <- v0
     c1   <- v1 - t1
     c2   <- t2 - v0 - vinf
     c3   <- t1 - t2
     c4   <- vinf
  */




More information about the gmp-devel mailing list