Factorial improvements

Fri Sep 16 01:34:03 CEST 2005

Jim Fougeron <jfoug at cox.net> writes:

  The best performance speedup's we found are due to 2 things.

  1.  pack as many multiplies together in single precision mode (i.e. multiply
  multiple small numbers into a 32 (or 64) bit machine value, and then when
  that becomes full, multiply tha into the gmp number.  We pre-computed
  a table of the number of values we could multiply together (without overflow),
  and where the "break over" points were, where one less number at a time
  must be used.

That "improvement" over the GMP 4.1.4 code shouldn't gain much,
since the GMP 4.1.4 code does exactly what you're describing.

  I do not remember the exact speedup we achieved over gmp's mpz/fac_ui.c
  code, but if I remember right, it was a significant improvement. (orders of 
  magnitude, and reduced the big-O cost, thus when the numbers grew in
  size, this method did better and better).

If you got "orders of magnitude", i.e., >= 100 times improvement,
I bow and promise never to write any more arithmetic routines in
y life.  But I'd be brash enough to suggest that you remember
incorrectly.

I actually think the GMP 4.1.4 code is close to asymptically
optimal, but others have shown that a 2-3 fold speedup is
possible.

-- 
Torbjörn