Factorial improvements
Torbjorn Granlund
tege at swox.com
Sat Sep 17 13:32:44 CEST 2005
Paul.Zimmermann at loria.fr (Paul Zimmermann) writes:
I got the following speedup with my code (I claim to be Paul Zimmermann):
mermoz% ./fac_ui 40000000
mpz_fac_ui took 727056ms # reference gmp-4.1.4 code
mpz_fac_ui2 took 255384ms # my new code
This is only a 2.85-fold speedup, but when 40000000 goes to infinity,
When a tiny number like 40000000 starts going to infinity, I will
definitely retire. :-)
I believe the 2.85 ratio goes to infinity too. Unfortunately my computer
has not enough memory to try...
Yes, with a computer that can address at least 2^100 bytes, and
with that much RAM actually in place, I agree that 100-fold speedup
becomes realistic. A measly 64-bit computer fully equipped with 18
exabyte RAM will come close.
I've also tried the "current GMP code" [*] and I get:
mermoz% ./fac_ui_gmp 40000000
mpz_fac_ui took 727820ms # reference gmp-4.1.4 code
mpz_fac_ui2 took 445347ms # current GMP code
Your code is indeed much faster for large n (n as in n-factorial).
Unfortunately, it needs more work, since it is also slower for
n < 10000, or much slower for really small n. Somebody needs to
work on that.
(BTW, It would be nice if some of the energy spent on n-factorial
where redirected towards n over k. The present code really
sucks.)
I actually think the GMP 4.1.4 code is close to asymptically
optimal, but others have shown that a 2-3 fold speedup is
possible.
As far as I can see, both the gmp-4.1.4 code and the "current GMP code"
have complexity O(M(n) log(n)^2), whereas my implementation of Scho"nhage
and al.'s algorithm has complexity O(M(n) log(n)), so with a big enough
computer, it should be possible to get a speedup of >= 100.
You're the complexity analyst here, and I humbly admit that I
fail to understand that the old code is O(n log(n)^3).
--
Torbjörn
More information about the gmp-discuss
mailing list