Wraparound multiplicaton vs mullo
Torbjorn Granlund
tg at gmplib.org
Thu Oct 22 15:25:39 CEST 2009
nisse at lysator.liu.se (Niels Möller) writes:
When n has a single trailing zero, performance is about the same as
x^2-1 (not surprising), saving around 25% compared to a full
multiplication. I'm still benchmarking over range 2-512 limbs, on an
x86_32.
When n has two trailing zerros, the dc algorithm beats both x^2-1 and
x^4-1, with typical savings of 25%, 32% and 35%.
When n has three trailing zeros, the dc algorithm gives a saving of
39%. Increasing the number of trailing zeros further gives pretty
small additional improvements in this range.
I suppose some maths on your table has the answer, but I ask anyway:
Does size 2^t m - k ever run slower than size 2^t m? Than it could help
to pad out one's operands (for the presumed main usages of this new
trick Newton, ab-cd with cancellation, etc). An interface would be
needed for that.
So for numbers of a couple of hundred limbs, multiplication mod 2^n -
1 is 67% faster (40% time saved) compared to full multiplication.
Wow! Which are the largest n for which it is still faster than full
multiplication?
--
Torbjörn
More information about the gmp-devel
mailing list