Wraparound multiplicaton vs mullo
Niels Möller
nisse at lysator.liu.se
Sat Oct 17 19:41:02 CEST 2009
nisse at lysator.liu.se (Niels "=?iso-8859-1?Q?M=F6ller?=") writes:
> Lets see what this means in Schoolbook, Karatsuba and Toom ranges,
> ignoring the O(n) term:
>
> Schoolbook Karatsuba Toom-3 Toom-4
> exponent 2 1.59 1.46 1.40
>
> x^2 - 1 0.5 M(n) 0.66 M(n) 0.72 M(n) 0.76 M(n)
> x^4 - 1 0.31 M(n) 0.56 M(n) 0.65 M(n) 0.71 M(n)
I've tried the x^2 - 1 algorithm now, and benchmarked on an AMD
x86_32. A first implementation is attached.
It's faster than full multiplication, mpn_mul_n, for n >= 20, and
faster than mpn_mullow_n for n >= 24 (current mpn_mullow_n uses
quadratic algorithms in this range, while for larger sizes it uses the
obvious divide-and-conquer strategy, not Mulders' enhancement).
I've benchmarked up to n = 500, and in the range 40 <= n <= 500, it
consistently saves approximately 25% compared to mpn_mul_n. Since this
is mostly toom3 and toom4 range, it's quite close to the above
analysis.
Regards,
/Niels
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mul_2nm1.c
Type: text/x-c-code
Size: 4963 bytes
Desc: not available
URL: <http://gmplib.org/list-archives/gmp-devel/attachments/20091017/63ead615/attachment.bin>
More information about the gmp-devel
mailing list