# 4-way Toom-Cook and Centrinia

Josh Liu zliu2 at student.gsu.edu
Wed Aug 4 01:23:11 CEST 2004

   I came up with the below transformations for interpolating the
product coefficients from the point products at points {0, 1/4, 1/2, 1,
2, 4, inf} in that order. Does anyone have any better transformations
for the same points? What about for different points? I would
appreciate it if anyone can improve upon what I have. I plan to
implement the 4-way Toom-Cook multiplication algorithm by the end of
the week so any suggestions, especially before, say, Sunday, would be
greatly appreciated.
One can TeX the below file or simply skim through it and look at the
crucial parts, as the formating is relatively transparent.
On a different note, I would like some advice for someone working on
a fast poly-algorithmic multiple precision arithmetic library. I
already have a working library. The library uses Karatsuba-Ofman, 3-way
Toom-Cook, and DWT based Schoenhage-Strassen multiplication,
Burnikel-Ziegler and Newton division, and Euclid, binary, and Lehmer
GCD. It is currently lacking in extended GCD, exact division, modular
arithmetic, rational arithmetic, and floating point arithmetic. It has
a goal of being a computer algebra system so there are many more
features that GMP does not have, such as a Jenkins-Traub polynomial
root finder, polynomial arithmetic (a Nussbaumer implementation is used
for multiplication), random number generators, prime number sieves,
factoring algorithms, and many many more features.
What are the most important aspects of GMP that deserved the most
attention during the development process? What made GMP so fast? Where
do most bugs originate? What does the debug cycle look like? How long
does it take, and what steps are taken, when a bug or invalidity is
found (the latter takes up more of my development time than any other
factor, almost combined)? Are there any desires for GMP to be
educational? Are there any efforts taken to reduce the size of the
code, both machine and source? If there is a desire to reduce code
size, what steps were taken at which points? Was the development of GMP
easy? What were the hardest parts in it's development? What
satisfaction do the authors get from developing the library? What would
be a good ranking, in relative order of importance, the following
factors: speed, stability, validity of results, machine code size, code
simplicity/ease of understanding, wealth of algorithms implemented,
development time, ease of development, memory footprint, 1337|\|355,
and personal knowledge attainment. I thank any response to the above
questions as they would be a guide for the development of my own
project.

P.S. I stole the basecase multiplication assembly code for the Pentium
4 and the part of the Burnikel-Ziegler division in the 2x1 operand
routine for odd sized operands directly from GMP. Is that okay? This is
a MIT license project. If anyone objects to my using these two blatant
copying of intellectual property, I would immediately revert to my own
code. GMP has inspired many implementations of algorithms indirectly.
Some functions share similarities with GMP code even though I wrote
these functions from scratch.

P.S.S. GMP has inspired me almost as much as Knuth. Thank you for such
great work.

---- Begin interpolation_transformations.tex ----

% interpolation_transformations.tex

$$W = \pmatrix { 1& 0& 0& 0& 0& 0& 0\cr 4096& 1024& 256& 64& 16& 4& 1\cr 64& 32& 16& 8& 4& 2& 1\cr 1& 1& 1& 1& 1& 1& 1\cr 1& 2& 4& 8& 16& 32& 64\cr 1& 4& 16& 64& 256& 1024& 4096\cr 0& 0& 0& 0& 0& 0& 1\cr }$$

$$A_0 = \pmatrix { 1& 0& 0& 0& 0& 0& 0\cr -4096& 1& 0& 0& 0& 0& -1\cr -64& 0& 1& 0& 0& 0& -1\cr -1& 0& 0& 1& 0& 0& -1\cr -1& 0& 0& 0& 1& 0& -64\cr -1& 0& 0& 0& 0& 1& -4096\cr 0& 0& 0& 0& 0& 0& 1\cr }$$

$$A_1 = \pmatrix { 1& 0& 0& 0& 0& 0& 0\cr 0& 1& 0& 0& 0& -1& 0\cr 0& 0& 1& 0& -1& 0& 0\cr 0& 0& 0& 1& 0& 0& 0\cr 0& 0& 1& 0& 1& 0& 0\cr 0& 1& 0& 0& 0& 1& 0\cr 0& 0& 0& 0& 0& 0& 1\cr }$$

$$A_2 = \pmatrix { 1& 0& 0& 0& 0& 0& 0\cr 0& 1& 0& 0& 0& 0& 0\cr 0& 0& 1& 0& 0& 0& 0\cr 0& 0& 0& 1& 0& 0& 0\cr 0& 0& 0& -16& 1& 0& 0\cr 0& 0& 0& -128& 0& 1& 0\cr 0& 0& 0& 0& 0& 0& 1\cr }$$

$$A_3 = \pmatrix { 1& 0& 0& 0& 0& 0& 0\cr 0& 1& -20& 0& 0& 0& 0\cr 0& -1& 34& 0& 0& 0& 0\cr 0& 0& 0& 1& 0& 0& 0\cr 0& 0& 0& 0& 50& -1& 0\cr 0& 0& 0& 0& -36& 1& 0\cr 0& 0& 0& 0& 0& 0& 1\cr }$$

$$A_4 = \pmatrix { 1& 0& 0& 0& 0& 0& 0\cr 0& 1/4& 0& 0& 0& 0& 0\cr 0& 0& 1/8& 0& 0& 0& 0\cr 0& 0& 0& 1& 0& 0& 0\cr 0& 0& 0& 0& 1/8& 0& 0\cr 0& 0& 0& 0& 0& 1/4& 0\cr 0& 0& 0& 0& 0& 0& 1\cr }$$

$$A_5 = \pmatrix { 1& 0& 0& 0& 0& 0& 0\cr 0& 1/105& 0& 0& 0& 0& 0\cr 0& 0& 1/21& 0& 0& 0& 0\cr 0& 0& 0& 1& 0& 0& 0\cr 0& 0& 0& 0& 1/7& 0& 0\cr 0& 0& 0& 0& 0& 1/63& 0\cr 0& 0& 0& 0& 0& 0& 1\cr }$$

$$A_6 = \pmatrix { 1& 0& 0& 0& 0& 0& 0\cr 0& 1& 0& 0& 0& 1& 0\cr 0& 0& 1& 0& 1& 0& 0\cr 0& 0& 0& 1& 1& 1& 0\cr 0& 0& -1& 0& 1& 0& 0\cr 0& -1& 0& 0& 0& 1& 0\cr 0& 0& 0& 0& 0& 0& 1\cr }$$

$$A_7 = \pmatrix { 1& 0& 0& 0& 0& 0& 0\cr 0& 1/2& 0& 0& 0& 0& 0\cr 0& 0& 1/2& 0& 0& 0& 0\cr 0& 0& 0& 1& 0& 0& 0\cr 0& 0& 0& 0& 1/2& 0& 0\cr 0& 0& 0& 0& 0& 1/2& 0\cr 0& 0& 0& 0& 0& 0& 1\cr }$$

$$A_0 \cdot W = \pmatrix { 1& 0& 0& 0& 0& 0& 0\cr 0& 1024& 256& 64& 16& 4& 0\cr 0& 32& 16& 8& 4& 2& 0\cr 0& 1& 1& 1& 1& 1& 0\cr 0& 2& 4& 8& 16& 32& 0\cr 0& 4& 16& 64& 256& 1024& 0\cr 0& 0& 0& 0& 0& 0& 1\cr }$$

$$A_1 \cdot A_0 \cdot W = \pmatrix { 1& 0& 0& 0& 0& 0& 0\cr 0& 1020& 240& 0& -240& -1020& 0\cr 0& 30& 12& 8& -12& -30& 0\cr 0& 1& 1& 1& 1& 1& 0\cr 0& 34& 20& 16& 20& 34& 0\cr 0& 1028& 272& 128& 272& 1028& 0\cr 0& 0& 0& 0& 0& 0& 1\cr }$$

$$A_2 \cdot A_1 \cdot A_0 \cdot W = \pmatrix { 1& 0& 0& 0& 0& 0& 0\cr 0& 1020& 240& 0& -240& -1020& 0\cr 0& 30& 12& 8& -12& -30& 0\cr 0& 1& 1& 1& 1& 1& 0\cr 0& 18& 4& 0& 4& 18& 0\cr 0& 900& 144& 0& 144& 900& 0\cr 0& 0& 0& 0& 0& 0& 1\cr }$$

$$A_3 \cdot A_2 \cdot A_1 \cdot A_0 \cdot W = \pmatrix { 1& 0& 0& 0& 0& 0& 0\cr 0& 420& 0& 0& 0& -420& 0\cr 0& 0& 168& 0& -168& 0& 0\cr 0& 1& 1& 1& 1& 1& 0\cr 0& 0& 56& 0& 56& 0& 0\cr 0& 252& 0& 0& 0& 252& 0\cr 0& 0& 0& 0& 0& 0& 1\cr }$$

$$A_4 \cdot A_3 \cdot A_2 \cdot A_1 \cdot A_0 \cdot W = \pmatrix { 1& 0& 0& 0& 0& 0& 0\cr 0& 105& 0& 0& 0& -105& 0\cr 0& 0& 21& 0& -21 0& 0\cr 0& 1& 1& 1& 1& 1& 0\cr 0& 0& 7& 0& 7& 0& 0\cr 0& 63& 0& 0& 0& 63& 0\cr 0& 0& 0& 0& 0& 0& 1\cr }$$

$$A_5 \cdot A_4 \cdot A_3 \cdot A_2 \cdot A_1 \cdot A_0 \cdot W = \pmatrix { 1& 0& 0& 0& 0& 0& 0\cr 0& 1& 0& 0& 0& -1& 0\cr 0& 0& 1& 0& -1& 0& 0\cr 0& 1& 1& 1& 1& 1& 0\cr 0& 0& 1& 0& 1& 0& 0\cr 0& 1& 0& 0& 0& 1& 0\cr 0& 0& 0& 0& 0& 0& 1\cr }$$

$$A_6 \cdot A_5 \cdot A_4 \cdot A_3 \cdot A_2 \cdot A_1 \cdot A_0 \cdot W = \pmatrix { 1& 0& 0& 0& 0& 0& 0\cr 0& 2& 0& 0& 0& 0& 0\cr 0& 0& 2& 0& 0& 0& 0\cr 0& 0& 0& 1& 0& 0& 0\cr 0& 0& 0& 0& 2& 0& 0\cr 0& 0& 0& 0& 0& 2& 0\cr 0& 0& 0& 0& 0& 0& 1\cr }$$

$$A_7 \cdot A_6 \cdot A_5 \cdot A_4 \cdot A_3 \cdot A_2 \cdot A_1 \cdot A_0 \cdot W = \pmatrix { 1& 0& 0& 0& 0& 0& 0\cr 0& 1& 0& 0& 0& 0& 0\cr 0& 0& 1& 0& 0& 0& 0\cr 0& 0& 0& 1& 0& 0& 0\cr 0& 0& 0& 0& 1& 0& 0\cr 0& 0& 0& 0& 0& 1& 0\cr 0& 0& 0& 0& 0& 0& 1\cr }$$

--- End interpolation_transformations.tex ----