Next: Unbalanced Multiplication, Previous: FFT Multiplication, Up: Multiplication Algorithms [Index]

The Toom algorithms described above (see Toom 3-Way Multiplication, see Toom 4-Way Multiplication) generalizes to split into an arbitrary number of pieces, as per Knuth section 4.3.3 algorithm C. This is not currently used. The notes here are merely for interest.

In general a split into *r+1* pieces is made, and evaluations and
pointwise multiplications done at *2*r+1* points. A 4-way split does 7
pointwise multiplies, 5-way does 9, etc. Asymptotically an *(r+1)*-way
algorithm is *O(N^(log(2*r+1)/log(r+1)))*. Only
the pointwise multiplications count towards big-*O* complexity, but the
time spent in the evaluate and interpolate stages grows with *r* and has
a significant practical impact, with the asymptotic advantage of each *r*
realized only at bigger and bigger sizes. The overheads grow as
*O(N*r)*, whereas in an *r=2^k* FFT they grow only as *O(N*log(r))*.

Knuth algorithm C evaluates at points 0,1,2,…,*2*r*, but exercise 4
uses *-r*,…,0,…,*r* and the latter saves some small
multiplies in the evaluate stage (or rather trades them for additions), and
has a further saving of nearly half the interpolate steps. The idea is to
separate odd and even final coefficients and then perform algorithm C steps C7
and C8 on them separately. The divisors at step C7 become *j^2* and the
multipliers at C8 become *2*t*j-j^2*.

Splitting odd and even parts through positive and negative points can be
thought of as using *-1* as a square root of unity. If a 4th root of
unity was available then a further split and speedup would be possible, but no
such root exists for plain integers. Going to complex integers with
*i=sqrt(-1)* doesn’t help, essentially because in Cartesian
form it takes three real multiplies to do a complex multiply. The existence
of *2^k'*th roots of unity in a suitable ring or field lets the fast
Fourier transform keep splitting and get to *O(N*log(r))*.

Floating point FFTs use complex numbers approximating Nth roots of unity. Some processors have special support for such FFTs. But these are not used in GMP since it’s very difficult to guarantee an exact result (to some number of bits). An occasional difference of 1 in the last bit might not matter to a typical signal processing algorithm, but is of course of vital importance to GMP.

Next: Unbalanced Multiplication, Previous: FFT Multiplication, Up: Multiplication Algorithms [Index]