Wraparound multiplicaton vs mullo

Sat Oct 17 19:41:02 CEST 2009

nisse at lysator.liu.se (Niels "=?iso-8859-1?Q?M=F6ller?=") writes:

> Lets see what this means in Schoolbook, Karatsuba and Toom ranges,
> ignoring the O(n) term:
>
>         Schoolbook  Karatsuba     Toom-3     Toom-4
> exponent         2       1.59       1.46       1.40           
>            
> x^2 - 1  0.5  M(n)  0.66 M(n)  0.72 M(n)  0.76 M(n)
> x^4 - 1  0.31 M(n)  0.56 M(n)  0.65 M(n)  0.71 M(n)

I've tried the x^2 - 1 algorithm now, and benchmarked on an AMD
x86_32. A first implementation is attached.

It's faster than full multiplication, mpn_mul_n, for n >= 20, and
faster than mpn_mullow_n for n >= 24 (current mpn_mullow_n uses
quadratic algorithms in this range, while for larger sizes it uses the
obvious divide-and-conquer strategy, not Mulders' enhancement).

I've benchmarked up to n = 500, and in the range 40 <= n <= 500, it
consistently saves approximately 25% compared to mpn_mul_n. Since this
is mostly toom3 and toom4 range, it's quite close to the above
analysis.

Regards,
/Niels

-------------- next part --------------
A non-text attachment was scrubbed...
Name: mul_2nm1.c
Type: text/x-c-code
Size: 4963 bytes
Desc: not available
URL: <http://gmplib.org/list-archives/gmp-devel/attachments/20091017/63ead615/attachment.bin>