fac_ui rewrote.
Torbjorn Granlund
tg at gmplib.org
Sun Jan 15 15:21:41 CET 2012
I have now read the new factorial code looking for possible further
improvements.
For the limb accumulation, the code uses this macro:
#define FACTOR_LIST_STORE(P, PR, MAX_PR, VEC, I) \
do { \
if ((PR) > (MAX_PR)) { \
(VEC)[(I)++] = (PR); \
(PR) = (P); \
} else \
(PR) *= (P); \
} while (0)
I think this can be improved along two possible lines.
1. Use an odd-factor variant of the mulN functions (introduced in my
binomial code, see https://gmplib.org/devel/).
2. Table pre-accumulated odd-factor products up to some limit n. I
think this does not merit a huge table, but perhaps up to a few
hundred limbs could be put into the lib.
The reason (1) is a possible improvement is twofold: it avoids branch
misprediction costs, and it breaks the chain of dependent multiplies.
Using functions mulN makes the code clean, but there is a a slight cost
for calling a function, in particular for machines with deep pipeline
but no prediction for jumps/call through a pointer (this is now unusual
for high-end x86 processors). At the cost of code size, we could
instead inline the functions, using (say) 8 loops. (Since you invoke
FACTOR_LIST_STORE inn several places, this might be slightly
unattractive, unless we can put such code in one place.)
If we add a table as per (2), we will need mulN functions (or inlined
loops) for smaller value of N, without accumulating fewer-than-possible
factors, and thereby losing some performance.
For a 64-bit machins, using a 1000-byte table, we get 125 limb entries,
with first omitted odd factor being 1633. Accumulation from that point
on will require 5 or fewer factors.
Saving the non-constant division for computing max_prod (and its name
variants) is also important. Choosing mulN functions should be done
with a small loop, see MAXFACS in e.g., smallk_bin_uiui.c. An integer
divison (by a non-constant divisor) cost 50-100 cycles. (Using a table
as per (2) we can truncate the N selection table at its head.)
(Should my binom code also work with odd factors? I haven't thought
about that.)
--
Torbjörn
More information about the gmp-devel
mailing list