Factorial improvements

Thu Sep 15 12:42:44 CEST 2005

>Date: Wed, 14 Sep 2005 22:19:39 +0200
>From: Marsac Laurent <laurent.marsac at esil.univ-mrs.fr>
>
>Hello,
>
>I'm currently trying to improve n! calculation (mpz/fac_ui.c) and i just
>want to know if anyone else is already working on it, or who is
>responsible of that file ?
>
>As described in the TODO list, i'm doing the prime factorization of n!,
>and shiftings for factors of 2. The first results seems to show my
>algorithm improves performances and i hope i could send more details
>here ASAP.
>
>--
>Laurent Marsac

With OpenPFGW, we wrote our own n! function.  Since we use our own 
efficient C++ wrapper class around the raw gmp, I will not post code,  
since it may make reading it harder to read by members of this list, along
with the fact that the function call is generic and handles several other things
beyond a simple n! (such as multi-factorial and modular factorials), thus 
again making it more difficult to read. 

The best performance speedup's we found are due to 2 things.

1.  pack as many multiplies together in single precision mode (i.e. multiply
multiple small numbers into a 32 (or 64) bit machine value, and then when
that becomes full, multiply tha into the gmp number.  We pre-computed
a table of the number of values we could multiply together (without overflow),
and where the "break over" points were, where one less number at a time
must be used.

2. split up the operation into a set of GMP accumulator numbers, then at
the end multiply all the accumulators together.  We chose a constant 4 
accumulators and seemed to work out well.  All the work we did was 
in the mpz level.  We did not investigate using the mpn level and doing
multbyonelimb type calls.  It is possible that a tuned mpn version would
outperform the split accumulator mpz method, however, I think that 
splitting up this iteration of multiplies into multiple accumulators is a
good way to improve the performance.

The first optimization avoids the multi-precision mults as much as possible.
The second gives a Karatsuba like divide and conquer affect.

I do not remember the exact speedup we achieved over gmp's mpz/fac_ui.c
code, but if I remember right, it was a significant improvement. (orders of 
magnitude, and reduced the big-O cost, thus when the numbers grew in
size, this method did better and better).  What we did in OpenPFGW was
never "tuned".  It was simply hacked together to get a significant speedup,
and then left at that (a "good enough" improvement)

Jim.

  Tray Helper - try to find something more useful...

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://gmplib.org/list-archives/gmp-discuss/attachments/20050915/263ab7b4/attachment.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: _th_.gif
Type: image/gif
Size: 2453 bytes
Desc: not available
Url : http://gmplib.org/list-archives/gmp-discuss/attachments/20050915/263ab7b4/_th_.gif