speed of unbalanced division
Torbjorn Granlund
tg at gmplib.org
Mon Feb 4 18:32:51 CET 2013
Zimmermann Paul <Paul.Zimmermann at loria.fr> writes:
on January 25 I reported efficiency issues with unbalanced multiplication.
Here I report similar issues with unbalanced division. Consider the program
below. I get on a 2Ghz Intel Xeon E7540 with GMP 5.1.0:
barbecue% ./bug2 2000001 1000001
mpz_tdiv_q(2000001,1000001) took 2244ms
barbecue% ./bug2 1698971 698971
mpz_tdiv_q(1698971,698971) took 3144ms
The first run computes the 10^6-limb quotient of the division of 2000001 limbs
by 1000001 limbs. The second run computes the 10^6-limb quotient of the
division of 1698971 limbs by 698971 limbs.
The second run should be faster (or equal in speed) to the first one, since
one can multiply both numerator and divisor by B^(1000001-698971) where B=2^64,
then perform a 2000001/1000001 division.
Instead the second run is 40% slower...
This is more alarming than the rather unalarming multiply performance
fluctuations. I must admit that it is actually unsurprising.
I put some diagrams at gmplib.org/devel/ that highlights the problem
over a large range. The 2nd plot covers your test case but for an AMD
Phenom 2. The 2st plot is for 10 times smaller operands, which
demonstrates the main problem but gives better smoothness. I suspect
some of the non-smoothness of the 2nd plot is due to limited FFT tables.
As soon as the dividend size is twice the divisor size one expect
(average) complexity to be constant for a given quotient size. The
diagrams indicate that the present code do behave ideally in that range.
(I measured much longer, but truncated these plot to highlight the
problematic ranges.)
The reason for this to be unsurprising is that the "MU" code didn't get
much thought about strategy decisions. I expect it to quite simple to
greatly improve the situation with some analysis. To get the curves
perfectly smooth might require more work.
--
Torbjörn
More information about the gmp-devel
mailing list