# Wraparound multiplicaton vs mullo

Niels Möller nisse at lysator.liu.se
Sat Oct 17 23:03:52 CEST 2009

```David Harvey <dmharvey at cims.nyu.edu> writes:

>> I've tried the x^2 - 1 algorithm now, and benchmarked on an AMD
>> x86_32. A first implementation is attached.
>
> Cool!

Now I've done a first version of x^4 -1 too, see attachment.

> How about recursing into the same routine for the multiplication
> modulo 2^(n/2) - 1?

The current code is not organized that way, it evaluates the
polynomial at +1, -1 (for x^2-1) and +1, -1, i (for x^4-1), performs
full balanced multiplication for each point, then interpolates a
polynomial of the same degree as the inputs, evaluates that polynomial
at B^{n/2} or B^{n/4}, and in that evaluation, the highest coefficient
wraps around.

The linear work can surely be done a lot better, in this first
implementation I did it in straight-forward but maybe not so clever
way.

> I suppose you would need n divisible by 4, or would need to

In the current code, the x^2-1 algorithm requires that n is even, and
the x^4-1 algorithm requires that n is divisible by 4. That could of
course be relaxed at the cost ofextra shifting, assuming that
the number of bits is divisible by 2 and 4, respectively.

Benchmarking mul_n, mullo_n, and the x^2-1 and x^4-1 algorithms over
the same range of 2-500 limbs as before gives the following (still on
AMD x86_32):

2-10:  mpn_mul_n wins.

12-22:  mpn_mullo_n wins, with at best a 14% saving (all savings are
compared to mpn_mul_n). This is basecase mullo.

24-50:  x^2-1 wins, with at best a 23% saving.

52-500: x^4-1 wins, with at best a 35% saving.

I'm not sure if I should be surprised that x^4-1 gets into the game so
early. For the range of 64-500, it saves 30% or more.

With cleverer evaluation and interpolation, speeding up the linear
work of x^4-1, I'd expect the range where x^2-1 wins will get pretty
narrow.

Regards,
/Niels

-------------- next part --------------
A non-text attachment was scrubbed...
Name: mul_2nm1.c
Type: text/x-c-code
Size: 13486 bytes
Desc: not available
URL: <http://gmplib.org/list-archives/gmp-devel/attachments/20091017/5f9dc7da/attachment-0001.bin>
```