Wraparound multiplicaton vs mullo

Niels Möller nisse at lysator.liu.se
Wed Oct 21 09:00:26 CEST 2009

bodrato at mail.dm.unipi.it writes:

>> First "evaluate" as follows,
>>  |s0|   | 1  1  1  1|   |a0|  # f(1)
>>  |s1|   | 1 -1  1 -1|   |a1|  # f(-1)
>>  |s2| = | 1  0 -1  0| * |a2|  # Re f(i)
>>  |s3|   | 0  1  0 -1|   |a3|  # Im f(i)
>>  |s4|   | 1  1 -1 -1|         # Re f(i) + Im f(i)
> Then you perform five multiplication... I think it would be better to
> implement a recursive function: compute the product Mod x^4-1 by computing
> it Mod x^2-1 and Mod x^2+1, then using CRT.

David suggested the same thing. I'm not sure that's going to be a win.

> You need:
> - two multiplication for the x^2-1 side, this is the recursive side;
> - three for the x^2+1 side, if you use Karatsuba...

And these are of the same size as in the current implementation,
right? So it's still the same 5 multiplications. Question is if
CRT:ing is more efficient than the interpolatino equations, but I
imagine that they'll turn ot to be equivalent, but maybe the CRT way
gives some other hints on how to arrange the computation.

I think it's nice to be able to use plain (and balanced)
multiplication for the pointwise multiplications, since these
functions are alredy optimized. But maybe you save some linear work by
doing it recursively, I guess one might at least get rid of the
addition for the wraparound of the highest polynomial coefficient.

Recursive implementation would also be more attractive if one could
write an efficient base case. But I'd expect it's going to be slower
than mul_basecase, since one have to do soome indexing wraparound in
the inner addmul_1 loop.

> But you can decide the best algorithm for the x^2+1 side, or let mul_n
> choose the best for you... maybe you can exploit the FFT for larger sizes!

I'm not so sure this is useful. In FFT range, it would be better to
use FFT for the top-level x^n-1, I think.

> Moreover you don't need to write different functions for x^4-1, x^8-1...

I doubt x^8-1 is a good idea, since I see no efficient way (below the
FFT range) to compute mod x^4 + 1.


More information about the gmp-devel mailing list