[Gmp-commit] /var/hg/gmp: 3 new changesets
mercurial at gmplib.org
mercurial at gmplib.org
Sat Feb 11 16:13:28 CET 2012
details: /var/hg/gmp/rev/2c4781bd8e78
changeset: 14622:2c4781bd8e78
user: Marco Bodrato <bodrato at mail.dm.unipi.it>
date: Sat Feb 11 16:01:56 2012 +0100
description:
Factorial documentation.
details: /var/hg/gmp/rev/cd638b86628d
changeset: 14623:cd638b86628d
user: Marco Bodrato <bodrato at mail.dm.unipi.it>
date: Sat Feb 11 16:08:53 2012 +0100
description:
Turn to UNLIKELY conditions in ASSERTs.
details: /var/hg/gmp/rev/10ec5b1e158b
changeset: 14624:10ec5b1e158b
user: Marco Bodrato <bodrato at mail.dm.unipi.it>
date: Sat Feb 11 16:13:13 2012 +0100
description:
toom3_itch: support any recursion depth.
diffstat:
ChangeLog | 8 +++++
doc/gmp.texi | 85 ++++++++++++++++++++++++++++++++++++++++-----------------
gmp-impl.h | 15 ++++++---
mpz/oddfac_1.c | 25 +++++++++++++++++
tests/refmpn.c | 6 ++--
5 files changed, 105 insertions(+), 34 deletions(-)
diffs (238 lines):
diff -r 19d7b40269eb -r 10ec5b1e158b ChangeLog
--- a/ChangeLog Sat Feb 11 12:51:21 2012 +0100
+++ b/ChangeLog Sat Feb 11 16:13:13 2012 +0100
@@ -1,3 +1,8 @@
+2012-02-11 Marco Bodrato <bodrato at mail.dm.unipi.it>
+
+ * doc/gmp.texi (Factorial): Shortly describe current algorithm.
+ * gmp-impl.h (ASSERT_ALWAYS): Consider failures UNLIKELY.
+
2012-02-10 Niels Möller <nisse at lysator.liu.se>
* tests/mpz/t-gcd.c (gcdext_valid_p): Enforce sligthly stricter
@@ -9,6 +14,9 @@
2012-02-09 Marco Bodrato <bodrato at mail.dm.unipi.it>
+ * gmp-impl.h (mpn_toom3*_itch): Support any recursion depth.
+ * tests/refmpn.c (refmpn_mul): Restore tight allocations.
+
* mpz/oddfac_1.c (mpz_oddfac_1): Get ready for n!!
* gmp-impl.h (mpz_oddfac_1): Update signature.
* mpz/fac_ui.c (mpz_fac_ui): Update call to mpz_oddfac_1.
diff -r 19d7b40269eb -r 10ec5b1e158b doc/gmp.texi
--- a/doc/gmp.texi Sat Feb 11 12:51:21 2012 +0100
+++ b/doc/gmp.texi Sat Feb 11 16:13:13 2012 +0100
@@ -8955,42 +8955,75 @@
@subsection Factorial
@cindex Factorial algorithm
-Factorials are calculated by a combination of removal of twos, powering, and
-binary splitting. The procedure can be best illustrated with an example,
+Factorials are calculated by a combination of two algorithms. An idea is
+shared among them: to compute the odd part of the factorial; a final step
+takes account of the power of @math{2} term, by shifting.
+
+For small @math{n}, the odd factor of @math{n!} is computed with the simple
+observation that it is equal to the product of all positive odd numbers
+smaller than @math{n} times the odd factor of @m{\lfloor n/2\rfloor!, [n/2]!},
+where @m{\lfloor x\rfloor, [x]} is the integer part of @math{x}, and so on
+recursively. The procedure can be best illustrated with an example,
@quotation
- at math{23! = 1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.16.17.18.19.20.21.22.23}
+ at math{23! = (23.21.19.17.15.13.11.9.7.5.3)(11.9.7.5.3)(5.3)2^{19}}
@end quotation
- at noindent
-has factors of two removed,
-
- at quotation
- at math{23! = 2^{19}.1.1.3.1.5.3.7.1.9.5.11.3.13.7.15.1.17.9.19.5.21.11.23}
- at end quotation
-
- at noindent
-and the resulting terms collected up according to their multiplicity,
-
- at quotation
- at math{23! = 2^{19}.(3.5)^3.(7.9.11)^2.(13.15.17.19.21.23)}
- at end quotation
-
-Each sequence such as @math{13.15.17.19.21.23} is evaluated by splitting into
-every second term, as for instance @math{(13.17.21).(15.19.23)}, and the same
-recursively on each half. This is implemented iteratively using some bit
-twiddling.
+Current code collects all the factors in a single list, with a loop and no
+recursion, and compute the product, with no special care for repeated chunks.
+
+When @math{n} is larger, computation pass trough prime sieving. An helper
+function is used, as suggested by Peter Luschny:
+ at tex
+$$\mathop{\rm msf}(n) = {n!\over\lfloor n/2\rfloor!^2\cdot2^k} = \prod_{p=3}^{n}
+p^{\mathop{\rm L}(p,n)} $$
+ at end tex
+ at ifnottex
+
+ at example
+ n
+ -----
+ n! | | L(p,n)
+msf(n) = -------------- = | | p
+ [n/2]!^2.2^k p=3
+ at end example
+ at end ifnottex
+
+Where @math{p} ranges on odd prime numbers. The exponent @math{k} is chosen to
+obtain an odd integer number: @math{k} is the number of 1 bits in the binary
+representation of @m{\lfloor n/2\rfloor, [n/2]}. The function L at math{(p,n)}
+can be defined as zero when @math{p} is composite, and, for any prime
+ at math{p}, it is computed with:
+ at tex
+$$\mathop{\rm L}(p,n) = \sum_{i>0}\left\lfloor{n\over p^i}\right\rfloor\bmod2
+\leq\log_p(n)$$
+ at end tex
+ at ifnottex
+
+ at example
+ ---
+ \ n
+L(p,n) = / [---] mod 2 <= log (n) .
+ --- p^i p
+ i>0
+ at end example
+ at end ifnottex
+
+With this helper function, we are able to compute the odd part of @math{n!}
+using the recursion implied by @m{n!=\lfloor n/2\rfloor!^2\cdot\mathop{\rm
+msf}(n)\cdot2^k , n!=[n/2]!^2*msf(n)*2^k}. The recursion stops using the
+small- at math{n} algorithm on some @m{\lfloor n/2^i\rfloor, [n/2^i]}.
+
+Both the above algorithms use binary splitting to compute the product of many
+small factors. At first as many products as possible are accumulated in a
+single register, generating a list of factors that fit in a machine word. This
+list is then split into halves, and the product is computed recursively.
Such splitting is more efficient than repeated N at cross{}1 multiplies since it
forms big multiplies, allowing Karatsuba and higher algorithms to be used.
And even below the Karatsuba threshold a big block of work can be more
efficient for the basecase algorithm.
-Splitting into subsequences of every second term keeps the resulting products
-more nearly equal in size than would the simpler approach of say taking the
-first half and second half of the sequence. Nearly equal products are more
-efficient for the current multiply implementation.
-
@node Binomial Coefficients Algorithm, Fibonacci Numbers Algorithm, Factorial Algorithm, Other Algorithms
@subsection Binomial Coefficients
diff -r 19d7b40269eb -r 10ec5b1e158b gmp-impl.h
--- a/gmp-impl.h Sat Feb 11 12:51:21 2012 +0100
+++ b/gmp-impl.h Sat Feb 11 16:13:13 2012 +0100
@@ -2225,7 +2225,7 @@
#define ASSERT_ALWAYS(expr) \
do { \
- if (!(expr)) \
+ if (UNLIKELY (!(expr))) \
ASSERT_FAIL (expr); \
} while (0)
@@ -4875,12 +4875,17 @@
#define mpn_toom2_sqr_itch(an) \
(2 * ((an) + GMP_NUMB_BITS))
-/* Can probably be trimmed to 2 an + O(log an). */
+/* toom33/toom3: Scratch need is 5an/2 + 10k, k is the recursion depth.
+ We use 3an + C, so that we can use a smaller constant.
+ */
#define mpn_toom33_mul_itch(an, bn) \
- ((5 * (an) >> 1) + GMP_NUMB_BITS)
+ (3 * (an) + GMP_NUMB_BITS)
#define mpn_toom3_sqr_itch(an) \
- ((5 * (an) >> 1) + GMP_NUMB_BITS)
-
+ (3 * (an) + GMP_NUMB_BITS)
+
+/* toom33/toom3: Scratch need is 8an/3 + 13k, k is the recursion depth.
+ We use 3an + C, so that we can use a smaller constant.
+ */
#define mpn_toom44_mul_itch(an, bn) \
(3 * (an) + GMP_NUMB_BITS)
#define mpn_toom4_sqr_itch(an) \
diff -r 19d7b40269eb -r 10ec5b1e158b mpz/oddfac_1.c
--- a/mpz/oddfac_1.c Sat Feb 11 12:51:21 2012 +0100
+++ b/mpz/oddfac_1.c Sat Feb 11 16:13:13 2012 +0100
@@ -314,6 +314,25 @@
return (CNST_LIMB(1) << (s >> 1)) + (CNST_LIMB(1) << ((s - 1) >> 1));
}
+#if 0
+/* A count-then-exponentiate variant for SWING_A_PRIME */
+#define SWING_A_PRIME(P, N, PR, MAX_PR, VEC, I) \
+ do { \
+ mp_limb_t __q, __prime; \
+ int __exp; \
+ __prime = (P); \
+ __exp = 0; \
+ __q = (N); \
+ do { \
+ __q /= __prime; \
+ __exp += __q & 1; \
+ } while (__q >= __prime); \
+ if (__exp) { /* Store $prime^{exp}$ */ \
+ for (__q = __prime; --__exp; __q *= __prime); \
+ FACTOR_LIST_STORE(__q, PR, MAX_PR, VEC, I); \
+ }; \
+ } while (0)
+#else
#define SWING_A_PRIME(P, N, PR, MAX_PR, VEC, I) \
do { \
mp_limb_t __q, __prime; \
@@ -325,6 +344,7 @@
if ((__q & 1) != 0) (PR) *= __prime; \
} while (__q >= __prime); \
} while (0)
+#endif
#define SH_SWING_A_PRIME(P, N, PR, MAX_PR, VEC, I) \
do { \
@@ -544,6 +564,11 @@
if (s != 0)
/* Use the algorithm described by Peter Luschny in "Divide,
Swing and Conquer the Factorial!".
+
+ Improvement: there are two temporary buffers, factors and
+ square, that are never used together; with a good estimate
+ of the maximal needed size, they could share a single
+ allocation.
*/
{
mpz_t mswing;
diff -r 19d7b40269eb -r 10ec5b1e158b tests/refmpn.c
--- a/tests/refmpn.c Sat Feb 11 12:51:21 2012 +0100
+++ b/tests/refmpn.c Sat Feb 11 16:13:13 2012 +0100
@@ -1881,21 +1881,21 @@
if (vn < TOOM4_THRESHOLD)
{
/* In the mpn_toom33_mul range, use mpn_toom22_mul. */
- tn = 3 * vn + mpn_toom22_mul_itch (vn, vn);
+ tn = 2 * vn + mpn_toom22_mul_itch (vn, vn);
tp = refmpn_malloc_limbs (tn);
mpn_toom22_mul (tp, up, vn, vp, vn, tp + 2 * vn);
}
else if (vn < FFT_THRESHOLD)
{
/* In the mpn_toom44_mul range, use mpn_toom33_mul. */
- tn = 3 * vn + mpn_toom33_mul_itch (vn, vn);
+ tn = 2 * vn + mpn_toom33_mul_itch (vn, vn);
tp = refmpn_malloc_limbs (tn);
mpn_toom33_mul (tp, up, vn, vp, vn, tp + 2 * vn);
}
else
{
/* Finally, for the largest operands, use mpn_toom44_mul. */
- tn = 3 * vn + mpn_toom44_mul_itch (vn, vn);
+ tn = 2 * vn + mpn_toom44_mul_itch (vn, vn);
tp = refmpn_malloc_limbs (tn);
mpn_toom44_mul (tp, up, vn, vp, vn, tp + 2 * vn);
}
More information about the gmp-commit
mailing list