tg at gmplib.org
Sun Dec 11 03:46:36 CET 2011
bodrato at mail.dm.unipi.it writes:
I tested the speed of your code with the many implementations I tested
last year (all available on my web page, in the same file as fac_ui). On
shell, for binomial(1234,k) the faster algorithm is:
- with k < 35, the current GMP 5.0.1 basecase (or my slight update of it);
- with 35 < k < 290, your_uiui4 code;
- with 290 < k < 450, my mpz_pro_bin_uiui;
- with 450 < k (<620), the code using prime sieving.
mpz_pro_bin_uiui is a simple code: it computes the odd factor of the
product n...(n-k+1) and divides by the odd component of the factorial of
I'm sure that we can get rid of it if we optimize your code.
OK, I've beaten my code some further.
(1) Now the number of allowable factors that fit into a limb is tabled.
This table takes into account that the mulN functions now shift out
(2) Less scratch space usage, for this needs the latest repo bdiv
updates (allowing dividend/quotient operand overlap).
(3) Misc minor optimisations.
I suspect that one remaining optimisation for the smallest operands is
to get rid of the final lshift. By being cleverer with 2 removal from
the dividend, that might be possible.
Now the code beats the repo code everywhere on my test machines. I
suspect that your improved basecase code might still be somewhat better
in some area, but perhaps the difference is negligible?
(If this code survives and in its present form, we need to generate the
table. I have dumbmp code for that.)
More information about the gmp-devel