Factorial improvements

Sat Sep 17 13:32:44 CEST 2005

Paul.Zimmermann at loria.fr (Paul Zimmermann) writes:

  I got the following speedup with my code (I claim to be Paul Zimmermann):

  mermoz% ./fac_ui 40000000
  mpz_fac_ui took 727056ms    # reference gmp-4.1.4 code
  mpz_fac_ui2 took 255384ms   # my new code

  This is only a 2.85-fold speedup, but when 40000000 goes to infinity,

When a tiny number like 40000000 starts going to infinity, I will
definitely retire.  :-)

  I believe the 2.85 ratio goes to infinity too. Unfortunately my computer
  has not enough memory to try...

Yes, with a computer that can address at least 2^100 bytes, and
with that much RAM actually in place, I agree that 100-fold speedup
becomes realistic.  A measly 64-bit computer fully equipped with 18
exabyte RAM will come close.

  I've also tried the "current GMP code" [*] and I get:

  mermoz% ./fac_ui_gmp 40000000
  mpz_fac_ui took 727820ms    # reference gmp-4.1.4 code
  mpz_fac_ui2 took 445347ms   # current GMP code

Your code is indeed much faster for large n (n as in n-factorial).
Unfortunately, it needs more work, since it is also slower for
n < 10000, or much slower for really small n.  Somebody needs to
work on that.

(BTW, It would be nice if some of the energy spent on n-factorial
where redirected towards n over k.  The present code really
sucks.)

     I actually think the GMP 4.1.4 code is close to asymptically
     optimal, but others have shown that a 2-3 fold speedup is
     possible.

  As far as I can see, both the gmp-4.1.4 code and the "current GMP code"
  have complexity O(M(n) log(n)^2), whereas my implementation of Scho"nhage
  and al.'s algorithm has complexity O(M(n) log(n)), so with a big enough
  computer, it should be possible to get a speedup of >= 100.

You're the complexity analyst here, and I humbly admit that I
fail to understand that the old code is O(n log(n)^3).

--
Torbjörn