Wraparound multiplicaton vs mullo
tg at gmplib.org
Sat Oct 17 21:59:49 CEST 2009
nisse at lysator.liu.se (Niels Möller) writes:
nisse at lysator.liu.se (Niels "=?iso-8859-1?Q?M=F6ller?=") writes:
> Lets see what this means in Schoolbook, Karatsuba and Toom ranges,
> ignoring the O(n) term:
> Schoolbook Karatsuba Toom-3 Toom-4
> exponent 2 1.59 1.46 1.40
> x^2 - 1 0.5 M(n) 0.66 M(n) 0.72 M(n) 0.76 M(n)
> x^4 - 1 0.31 M(n) 0.56 M(n) 0.65 M(n) 0.71 M(n)
I've tried the x^2 - 1 algorithm now, and benchmarked on an AMD
x86_32. A first implementation is attached.
It's faster than full multiplication, mpn_mul_n, for n >= 20, and
faster than mpn_mullow_n for n >= 24 (current mpn_mullow_n uses
quadratic algorithms in this range, while for larger sizes it uses the
obvious divide-and-conquer strategy, not Mulders' enhancement).
I've benchmarked up to n = 500, and in the range 40 <= n <= 500, it
consistently saves approximately 25% compared to mpn_mul_n. Since this
is mostly toom3 and toom4 range, it's quite close to the above
* Experimental code for multiplication mod 2^n-1.
Should this be B^n-1?
/* Inputs and outputs are n limbs, computation is mod B^n - 1, and
* values are seminormalized; zero is represented as either 0 or B^n -
* 1. Needs 2n limbs at rp. */
mul_2nm1_ref (mp_ptr rp, mp_srcptr ap, mp_srcptr bp, mp_size_t n)
mpn_mul_n (rp, ap, bp, n);
cy = mpn_add_n (rp, rp, rp+n, n);
/* If cy == 1, then the value of rp is at most B^n - 2, so there can
* be no overflow when adding in the carry. */
MPN_INCR_U (rp, n, cy);
Do you represent 0 canonically as B^n-1?
(I suppose one could preincrement, then decrement if not carry,
to canonicalize to 0 instead.)
mul_2nm1_2 (mp_ptr rp, mp_srcptr ap, mp_srcptr bp, mp_size_t size,
2nm1 is for 2^n minus 1? Again, isn't B the correct base?
What is the _2 for?
You might want to add a LGPL 3 header to the code, without any header it
defaults to "all rights reserved".
More information about the gmp-devel