unbalanced mul/div
Torbjorn Granlund
tg at gmplib.org
Sun Jan 10 22:15:21 CET 2010
Paul Zimmermann <Paul.Zimmermann at loria.fr> writes:
I noticed a nice speedup for unbalanced mul/div from GMP 4.3.2 to GMP 5.0.0.
See http://www.loria.fr/~zimmerma/tmuldiv10k.ps. This was tested on a 2.4Ghz
Opteron. The "mul" test multiplies x limbs (where x is in abscissa) by n-x
limbs, where n=10000. The "div" test divides n limbs by x limbs with remainder
(mpn_tdiv_qr).
There are still some irregularities and spikes, but the overall speedup is
in the 25%-33% range for both mul and div.
While there is a lot of careful design and implementation behind the new
multiplication and division code, several parameter selections are quite
ad hoc. The MU algorithm block size is an example of that.
For div 10k, there are funny shapes at divisor size 3333 and 5000. At
3333 the MU algorithm switches between 2 and 3 blocks. Around 5999 2
blocks will be used, so the shape there have other reasons.
I suppose close-to-invariant speed for multiplication indicates some
inefficiency as well; it should be higher in the middle. One problem
is that we loop over fft without saving transformed operands. This will
be easy to fix, of course.
Thanks for these diagrams! It is always useful to see measurements from
external researchers!
--
Torbjörn
More information about the gmp-devel
mailing list