15.1.7 Other Multiplication

The Toom algorithms described above (see Toom 3-Way Multiplication, see Toom 4-Way Multiplication) generalizes to split into an arbitrary number of pieces, as per Knuth section 4.3.3 algorithm C. This is not currently used. The notes here are merely for interest.

In general a split into r+1 pieces is made, and evaluations and pointwise multiplications done at 2*r+1 points. A 4-way split does 7 pointwise multiplies, 5-way does 9, etc. Asymptotically an (r+1)-way algorithm is O(N^(log(2*r+1)/log(r+1))). Only the pointwise multiplications count towards big-O complexity, but the time spent in the evaluate and interpolate stages grows with r and has a significant practical impact, with the asymptotic advantage of each r realized only at bigger and bigger sizes. The overheads grow as O(N*r), whereas in an r=2^k FFT they grow only as O(N*log(r)).

Knuth algorithm C evaluates at points 0,1,2,…,2*r, but exercise 4 uses -r,…,0,…,r and the latter saves some small multiplies in the evaluate stage (or rather trades them for additions), and has a further saving of nearly half the interpolate steps. The idea is to separate odd and even final coefficients and then perform algorithm C steps C7 and C8 on them separately. The divisors at step C7 become j^2 and the multipliers at C8 become 2*t*j-j^2.

Splitting odd and even parts through positive and negative points can be thought of as using -1 as a square root of unity. If a 4th root of unity was available then a further split and speedup would be possible, but no such root exists for plain integers. Going to complex integers with i=sqrt(-1) doesn’t help, essentially because in Cartesian form it takes three real multiplies to do a complex multiply. The existence of 2^k'th roots of unity in a suitable ring or field lets the fast Fourier transform keep splitting and get to O(N*log(r)).

Floating point FFTs use complex numbers approximating Nth roots of unity. Some processors have special support for such FFTs. But these are not used in GMP since it’s very difficult to guarantee an exact result (to some number of bits). An occasional difference of 1 in the last bit might not matter to a typical signal processing algorithm, but is of course of vital importance to GMP.