Wraparound multiplicaton vs mullo

Sun Oct 18 16:30:33 CEST 2009

David Harvey <dmharvey at cims.nyu.edu> writes:

> Then of course x^3 - x, or more generally (x - a[1]) ... (x - a[n]).
> So e.g. instead of x^4 - 1, you could do x^4 - 5*x^2 + 4 = (x^2 - 1)
> * (x^2 - 4). I bet that could be made faster than using x^4 - 1.

Do you think that more complicated "wraparound" would be useful, e.g.,
for Newton iteration? Its usefulness is at least not obvious to me.

Maybe I should write down how the x^4-1 algorithm works. Structure is
of course somewhat similar to toom, and it's even more similar to
Strassen matrix multiplication. Input numbers a and b are treated as
polynomials of degree 3, a0 + a1 x+ a2 x^2+ a3 x^3 and similarly for
b0,...,b3.

First "evaluate" as follows,

 |s0|   | 1  1  1  1|   |a0|  # f(1)
 |s1|   | 1 -1  1 -1|   |a1|  # f(-1)
 |s2| = | 1  0 -1  0| * |a2|  # Re f(i)
 |s3|   | 0  1  0 -1|   |a3|  # Im f(i)
 |s4|   | 1  1 -1 -1|         # Re f(i) + Im f(i)

And similarly let t0...,t4 be the evaluation of b, using the same
matrix. Then multiply pointwise, u_k = s_k * t_k.

Let the product polynomial be c0 + c1 x + c2 x^2 + c3 x^3.
Then its values at these points are given by

 | 1  1  1  1|   |c0|   | 1 0  0  0 0 |   |u0|
 | 1 -1  1 -1|   |c1|   | 0 1  0  0 0 |   |u1|
 | 1  0 -1  0| * |c2| = | 0 0  1 -1 0 | * |u2|
 | 0  1  0 -1|   |c3|   | 0 0 -1 -1 1 |   |u3|
                                          |u4|

The current code first multiplies the submatrix on the right related to u2,u3,u4
(the complex multiplication), and then does some butterfly operations
to invert the left matrix. Explicitly inverting, one gets the
expression

 |c0|       | 1  1  2 -2  0 |   |u0|
 |c1|    1  | 1 -1 -2 -2  2 |   |u1|
 |c2| = --- | 1  1 -2  2  0 | * |u2|
 |c3|    4  | 1 -1  2  2 -2 |   |u3|
                                |u4|

And finally, evaluate the c polynomial in B^{n/4}, with wraparound:

 c = (c0 + c1 B^{n/4} + c2 B^{n/2} + c3 B^{3n/4}) mod B^n 

One subtely is that c0,c1,c2,c3 turns out to be non-negative. That's
because polynomial multiplication mod x^n - 1 also is a positive
wraparound; it adds higher and lower coefficients together. THat's not
true in general, e.g., polynomial multiplication mod x^n + 1 doesn't
work like this: the product of two polynomials with non-negative
coefficents could result in a product with some coefficients being
negative.

Almost all the linear operations are butterfly operations, i.e., they
could use mpn_add_n_sub_n. The current code doesn't exploit that (and
I'm not sure which platforms have an efficient add_n_sub_n). Not
surprising, since it can be viewed also as an FFT of size 4, over the
ring Z[i].

Regards,
/Niels