GMP 4.3 multiplication performance
tg at gmplib.org
Sun May 31 19:35:17 CEST 2009
While working on multiplication improvements for GMP 4.4, I realized
that we are doing something quite silly for unbalanced operands in GMP
4.3. For some machines, GMP 4.3 might actually be substantially slower
than GMP 4.2!
The current multiply code works roughly like this:
if (operands have same size)
use mpn_sqr_n or mpn_mul_n
if (schoolbook operand range)
if (fft range)
else use toom42, toom32, or toom22, or some combination of these.
The problem is that for any unbalanced operands between the schoolbook
range and the fft range, we will basically use karatsuba's algorithm,
not any toom variant. This is pretty awful except for smallish
The use of unbalanced toom/karatsuba was introduced with GMP 4.3, and is
a great improvement for many operands. Unfortunately, it is done too
Some measurements demonstrating the effects:
7900 #11635165.00 20355963.00
8000 #10745592.00 10750716.00
I.e., a 7900-limb operand times a 7900-lim operand takes 11635165 cycles
on an athlon 64. But a I.e., a 7901-limb operand times a 7900-lim
operand takes 20355963 cycles.
In the FFT range, as for 8000 limbs, things are back to normal. The FFT
code handles arbitrarily unbalanced operands (although its performance
degrades when the operands are very unbalanced).
(Note that a balanced 7900-limb product is also slower than a 8000-limb
balanced product. That's another flaw.)
More information about the gmp-devel