Factorial improvements
Torbjorn Granlund
tege at swox.com
Fri Sep 16 01:34:03 CEST 2005
Jim Fougeron <jfoug at cox.net> writes:
The best performance speedup's we found are due to 2 things.
1. pack as many multiplies together in single precision mode (i.e. multiply
multiple small numbers into a 32 (or 64) bit machine value, and then when
that becomes full, multiply tha into the gmp number. We pre-computed
a table of the number of values we could multiply together (without overflow),
and where the "break over" points were, where one less number at a time
must be used.
That "improvement" over the GMP 4.1.4 code shouldn't gain much,
since the GMP 4.1.4 code does exactly what you're describing.
I do not remember the exact speedup we achieved over gmp's mpz/fac_ui.c
code, but if I remember right, it was a significant improvement. (orders of
magnitude, and reduced the big-O cost, thus when the numbers grew in
size, this method did better and better).
If you got "orders of magnitude", i.e., >= 100 times improvement,
I bow and promise never to write any more arithmetic routines in
y life. But I'd be brash enough to suggest that you remember
incorrectly.
I actually think the GMP 4.1.4 code is close to asymptically
optimal, but others have shown that a 2-3 fold speedup is
possible.
--
Torbjörn
More information about the gmp-discuss
mailing list