[Gmp-commit] /var/hg/gmp: 2 new changesets
mercurial at gmplib.org
mercurial at gmplib.org
Sat Feb 11 16:15:54 CET 2012
details: /var/hg/gmp/rev/d2768b5f9c94
changeset: 14625:d2768b5f9c94
user: Torbjorn Granlund <tege at gmplib.org>
date: Sat Feb 11 16:07:41 2012 +0100
description:
New bulldozer support.
details: /var/hg/gmp/rev/69be3612ec0d
changeset: 14626:69be3612ec0d
user: Torbjorn Granlund <tege at gmplib.org>
date: Sat Feb 11 16:15:52 2012 +0100
description:
Trivial merge.
diffstat:
ChangeLog | 14 +++
doc/gmp.texi | 85 ++++++++++++++------
gmp-impl.h | 15 ++-
mpn/x86_64/bd1/gmp-mparam.h | 182 ++++++++++++++++++++++++++++++++++++++++++++
mpz/oddfac_1.c | 25 ++++++
tests/refmpn.c | 6 +-
6 files changed, 293 insertions(+), 34 deletions(-)
diffs (truncated from 444 to 300 lines):
diff -r 19d7b40269eb -r 69be3612ec0d ChangeLog
--- a/ChangeLog Sat Feb 11 12:51:21 2012 +0100
+++ b/ChangeLog Sat Feb 11 16:15:52 2012 +0100
@@ -1,3 +1,8 @@
+2012-02-11 Marco Bodrato <bodrato at mail.dm.unipi.it>
+
+ * doc/gmp.texi (Factorial): Shortly describe current algorithm.
+ * gmp-impl.h (ASSERT_ALWAYS): Consider failures UNLIKELY.
+
2012-02-10 Niels Möller <nisse at lysator.liu.se>
* tests/mpz/t-gcd.c (gcdext_valid_p): Enforce sligthly stricter
@@ -9,6 +14,9 @@
2012-02-09 Marco Bodrato <bodrato at mail.dm.unipi.it>
+ * gmp-impl.h (mpn_toom3*_itch): Support any recursion depth.
+ * tests/refmpn.c (refmpn_mul): Restore tight allocations.
+
* mpz/oddfac_1.c (mpz_oddfac_1): Get ready for n!!
* gmp-impl.h (mpz_oddfac_1): Update signature.
* mpz/fac_ui.c (mpz_fac_ui): Update call to mpz_oddfac_1.
@@ -58,6 +66,8 @@
2012-02-03 Torbjorn Granlund <tege at gmplib.org>
+ * mpn/x86_64/bd1/gmp-mparam.h: New file.
+
* longlong.h (udiv_qrnnd from sdiv_qrnnd): Declare udiv_w_sdiv.
* mpn/generic/udiv_w_sdiv.c: Use c89 function header.
@@ -134,6 +144,10 @@
* doc/gmp.texi (gmp_randclass::get_z_bits): Use mp_bitcnt_t.
* gmpxx.h: Replace unsigned long with mp_bitcnt_t.
+2012-01-27 Torbjorn Granlund <tege at gmplib.org>
+
+ * Upgrade to libtool 2.4.2.
+
2012-01-26 Marco Bodrato <bodrato at mail.dm.unipi.it>
* tests/mpz/t-fac_ui.c: Increase default test cases.
diff -r 19d7b40269eb -r 69be3612ec0d doc/gmp.texi
--- a/doc/gmp.texi Sat Feb 11 12:51:21 2012 +0100
+++ b/doc/gmp.texi Sat Feb 11 16:15:52 2012 +0100
@@ -8955,42 +8955,75 @@
@subsection Factorial
@cindex Factorial algorithm
-Factorials are calculated by a combination of removal of twos, powering, and
-binary splitting. The procedure can be best illustrated with an example,
+Factorials are calculated by a combination of two algorithms. An idea is
+shared among them: to compute the odd part of the factorial; a final step
+takes account of the power of @math{2} term, by shifting.
+
+For small @math{n}, the odd factor of @math{n!} is computed with the simple
+observation that it is equal to the product of all positive odd numbers
+smaller than @math{n} times the odd factor of @m{\lfloor n/2\rfloor!, [n/2]!},
+where @m{\lfloor x\rfloor, [x]} is the integer part of @math{x}, and so on
+recursively. The procedure can be best illustrated with an example,
@quotation
- at math{23! = 1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.16.17.18.19.20.21.22.23}
+ at math{23! = (23.21.19.17.15.13.11.9.7.5.3)(11.9.7.5.3)(5.3)2^{19}}
@end quotation
- at noindent
-has factors of two removed,
-
- at quotation
- at math{23! = 2^{19}.1.1.3.1.5.3.7.1.9.5.11.3.13.7.15.1.17.9.19.5.21.11.23}
- at end quotation
-
- at noindent
-and the resulting terms collected up according to their multiplicity,
-
- at quotation
- at math{23! = 2^{19}.(3.5)^3.(7.9.11)^2.(13.15.17.19.21.23)}
- at end quotation
-
-Each sequence such as @math{13.15.17.19.21.23} is evaluated by splitting into
-every second term, as for instance @math{(13.17.21).(15.19.23)}, and the same
-recursively on each half. This is implemented iteratively using some bit
-twiddling.
+Current code collects all the factors in a single list, with a loop and no
+recursion, and compute the product, with no special care for repeated chunks.
+
+When @math{n} is larger, computation pass trough prime sieving. An helper
+function is used, as suggested by Peter Luschny:
+ at tex
+$$\mathop{\rm msf}(n) = {n!\over\lfloor n/2\rfloor!^2\cdot2^k} = \prod_{p=3}^{n}
+p^{\mathop{\rm L}(p,n)} $$
+ at end tex
+ at ifnottex
+
+ at example
+ n
+ -----
+ n! | | L(p,n)
+msf(n) = -------------- = | | p
+ [n/2]!^2.2^k p=3
+ at end example
+ at end ifnottex
+
+Where @math{p} ranges on odd prime numbers. The exponent @math{k} is chosen to
+obtain an odd integer number: @math{k} is the number of 1 bits in the binary
+representation of @m{\lfloor n/2\rfloor, [n/2]}. The function L at math{(p,n)}
+can be defined as zero when @math{p} is composite, and, for any prime
+ at math{p}, it is computed with:
+ at tex
+$$\mathop{\rm L}(p,n) = \sum_{i>0}\left\lfloor{n\over p^i}\right\rfloor\bmod2
+\leq\log_p(n)$$
+ at end tex
+ at ifnottex
+
+ at example
+ ---
+ \ n
+L(p,n) = / [---] mod 2 <= log (n) .
+ --- p^i p
+ i>0
+ at end example
+ at end ifnottex
+
+With this helper function, we are able to compute the odd part of @math{n!}
+using the recursion implied by @m{n!=\lfloor n/2\rfloor!^2\cdot\mathop{\rm
+msf}(n)\cdot2^k , n!=[n/2]!^2*msf(n)*2^k}. The recursion stops using the
+small- at math{n} algorithm on some @m{\lfloor n/2^i\rfloor, [n/2^i]}.
+
+Both the above algorithms use binary splitting to compute the product of many
+small factors. At first as many products as possible are accumulated in a
+single register, generating a list of factors that fit in a machine word. This
+list is then split into halves, and the product is computed recursively.
Such splitting is more efficient than repeated N at cross{}1 multiplies since it
forms big multiplies, allowing Karatsuba and higher algorithms to be used.
And even below the Karatsuba threshold a big block of work can be more
efficient for the basecase algorithm.
-Splitting into subsequences of every second term keeps the resulting products
-more nearly equal in size than would the simpler approach of say taking the
-first half and second half of the sequence. Nearly equal products are more
-efficient for the current multiply implementation.
-
@node Binomial Coefficients Algorithm, Fibonacci Numbers Algorithm, Factorial Algorithm, Other Algorithms
@subsection Binomial Coefficients
diff -r 19d7b40269eb -r 69be3612ec0d gmp-impl.h
--- a/gmp-impl.h Sat Feb 11 12:51:21 2012 +0100
+++ b/gmp-impl.h Sat Feb 11 16:15:52 2012 +0100
@@ -2225,7 +2225,7 @@
#define ASSERT_ALWAYS(expr) \
do { \
- if (!(expr)) \
+ if (UNLIKELY (!(expr))) \
ASSERT_FAIL (expr); \
} while (0)
@@ -4875,12 +4875,17 @@
#define mpn_toom2_sqr_itch(an) \
(2 * ((an) + GMP_NUMB_BITS))
-/* Can probably be trimmed to 2 an + O(log an). */
+/* toom33/toom3: Scratch need is 5an/2 + 10k, k is the recursion depth.
+ We use 3an + C, so that we can use a smaller constant.
+ */
#define mpn_toom33_mul_itch(an, bn) \
- ((5 * (an) >> 1) + GMP_NUMB_BITS)
+ (3 * (an) + GMP_NUMB_BITS)
#define mpn_toom3_sqr_itch(an) \
- ((5 * (an) >> 1) + GMP_NUMB_BITS)
-
+ (3 * (an) + GMP_NUMB_BITS)
+
+/* toom33/toom3: Scratch need is 8an/3 + 13k, k is the recursion depth.
+ We use 3an + C, so that we can use a smaller constant.
+ */
#define mpn_toom44_mul_itch(an, bn) \
(3 * (an) + GMP_NUMB_BITS)
#define mpn_toom4_sqr_itch(an) \
diff -r 19d7b40269eb -r 69be3612ec0d mpn/x86_64/bd1/gmp-mparam.h
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/mpn/x86_64/bd1/gmp-mparam.h Sat Feb 11 16:15:52 2012 +0100
@@ -0,0 +1,182 @@
+/* AMD Bulldozer-1 gmp-mparam.h -- Compiler/machine parameter header file.
+
+Copyright 1991, 1993, 1994, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007,
+2008, 2009, 2010, 2012 Free Software Foundation, Inc.
+
+This file is part of the GNU MP Library.
+
+The GNU MP Library is free software; you can redistribute it and/or modify
+it under the terms of the GNU Lesser General Public License as published by
+the Free Software Foundation; either version 3 of the License, or (at your
+option) any later version.
+
+The GNU MP Library is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public
+License for more details.
+
+You should have received a copy of the GNU Lesser General Public License
+along with the GNU MP Library. If not, see http://www.gnu.org/licenses/. */
+
+#define GMP_LIMB_BITS 64
+#define BYTES_PER_MP_LIMB 8
+
+#define MOD_1_NORM_THRESHOLD 0 /* always */
+#define MOD_1_UNNORM_THRESHOLD 0 /* always */
+#define MOD_1N_TO_MOD_1_1_THRESHOLD 7
+#define MOD_1U_TO_MOD_1_1_THRESHOLD 5
+#define MOD_1_1_TO_MOD_1_2_THRESHOLD 0 /* never mpn_mod_1_1p */
+#define MOD_1_2_TO_MOD_1_4_THRESHOLD 12
+#define PREINV_MOD_1_TO_MOD_1_THRESHOLD 14
+#define USE_PREINV_DIVREM_1 1 /* native */
+#define DIVEXACT_1_THRESHOLD 0 /* always (native) */
+#define BMOD_1_TO_MOD_1_THRESHOLD 24
+
+#define MUL_TOOM22_THRESHOLD 18
+#define MUL_TOOM33_THRESHOLD 53
+#define MUL_TOOM44_THRESHOLD 154
+#define MUL_TOOM6H_THRESHOLD 274
+#define MUL_TOOM8H_THRESHOLD 466
+
+#define MUL_TOOM32_TO_TOOM43_THRESHOLD 97
+#define MUL_TOOM32_TO_TOOM53_THRESHOLD 140
+#define MUL_TOOM42_TO_TOOM53_THRESHOLD 105
+#define MUL_TOOM42_TO_TOOM63_THRESHOLD 109
+
+#define SQR_BASECASE_THRESHOLD 0 /* always (native) */
+#define SQR_TOOM2_THRESHOLD 24
+#define SQR_TOOM3_THRESHOLD 85
+#define SQR_TOOM4_THRESHOLD 119
+#define SQR_TOOM6_THRESHOLD 318
+#define SQR_TOOM8_THRESHOLD 502
+
+#define MULMOD_BNM1_THRESHOLD 11
+#define SQRMOD_BNM1_THRESHOLD 16
+
+#define MUL_FFT_MODF_THRESHOLD 412 /* k = 5 */
+#define MUL_FFT_TABLE3 \
+ { { 412, 5}, { 19, 6}, { 10, 5}, { 21, 6}, \
+ { 11, 5}, { 23, 6}, { 21, 7}, { 11, 6}, \
+ { 23, 7}, { 21, 8}, { 11, 7}, { 25, 8}, \
+ { 13, 7}, { 28, 8}, { 15, 7}, { 31, 8}, \
+ { 21, 9}, { 11, 8}, { 27, 9}, { 15, 8}, \
+ { 33, 9}, { 19, 8}, { 41, 9}, { 23, 8}, \
+ { 47, 9}, { 27,10}, { 15, 9}, { 31, 8}, \
+ { 63, 9}, { 39,10}, { 23, 9}, { 51,11}, \
+ { 15,10}, { 31, 9}, { 67,10}, { 39, 9}, \
+ { 79,10}, { 47, 9}, { 95,10}, { 55,11}, \
+ { 31,10}, { 79,11}, { 47,10}, { 103,12}, \
+ { 31,11}, { 63,10}, { 127,11}, { 79,10}, \
+ { 175,11}, { 95,10}, { 191,12}, { 63,11}, \
+ { 127,10}, { 255,11}, { 143,10}, { 287,11}, \
+ { 159,12}, { 95,13}, { 63,12}, { 127,11}, \
+ { 271, 9}, { 1087,11}, { 287,10}, { 575,11}, \
+ { 303,12}, { 159,11}, { 319,10}, { 671,11}, \
+ { 351,12}, { 191,11}, { 383,10}, { 767,11}, \
+ { 415,12}, { 223,11}, { 447,13}, { 127,12}, \
+ { 255,11}, { 543,12}, { 287,11}, { 575,10}, \
+ { 1215,12}, { 319,11}, { 639,12}, { 351,13}, \
+ { 191,12}, { 383,11}, { 767,12}, { 415,11}, \
+ { 831,10}, { 1663,12}, { 447,14}, { 127,13}, \
+ { 255,12}, { 543,11}, { 1087,10}, { 2175,12}, \
+ { 575,11}, { 1151,12}, { 607,11}, { 1215,13}, \
+ { 319,12}, { 639,11}, { 1279,12}, { 671,11}, \
+ { 1343,10}, { 2687,12}, { 703,13}, { 383,12}, \
+ { 767,11}, { 1535,12}, { 831,13}, { 447,12}, \
+ { 959,14}, { 255,13}, { 511,12}, { 1087,11}, \
+ { 2175,13}, { 575,12}, { 1215,11}, { 2431,10}, \
+ { 4863,13}, { 639,12}, { 1343,11}, { 2687,13}, \
+ { 703,12}, { 1407,14}, { 383,13}, { 767,12}, \
+ { 1535,13}, { 831,12}, { 1663,13}, { 959,15}, \
+ { 255,14}, { 511,13}, { 1087,12}, { 2175,13}, \
+ { 1215,12}, { 2431,11}, { 4863,14}, { 639,13}, \
+ { 1343,12}, { 2687,13}, { 1471,12}, { 2943,11}, \
+ { 5887,14}, { 767,13}, { 1599,12}, { 3199,13}, \
+ { 1727,14}, { 895,13}, { 1919,12}, { 3839,15}, \
+ { 511,14}, { 1023,13}, { 2175,14}, { 1151,13}, \
+ { 2431,12}, { 4863,14}, { 16384,15}, { 32768,16}, \
+ { 65536,17}, { 131072,18}, { 262144,19}, { 524288,20}, \
+ {1048576,21}, {2097152,22}, {4194304,23}, {8388608,24} }
+#define MUL_FFT_TABLE3_SIZE 168
+#define MUL_FFT_THRESHOLD 4736
+
+#define SQR_FFT_MODF_THRESHOLD 368 /* k = 5 */
+#define SQR_FFT_TABLE3 \
+ { { 368, 5}, { 19, 6}, { 10, 5}, { 21, 6}, \
+ { 21, 7}, { 11, 6}, { 23, 7}, { 21, 8}, \
+ { 11, 7}, { 25, 8}, { 13, 7}, { 28, 8}, \
+ { 15, 7}, { 31, 8}, { 17, 7}, { 35, 8}, \
+ { 19, 7}, { 39, 8}, { 27, 9}, { 15, 8}, \
+ { 35, 9}, { 19, 8}, { 41, 9}, { 23, 8}, \
+ { 47, 9}, { 27,10}, { 15, 9}, { 39,10}, \
+ { 23, 9}, { 51,11}, { 15,10}, { 31, 9}, \
+ { 67,10}, { 39, 9}, { 79,10}, { 47, 9}, \
+ { 95,10}, { 55,11}, { 31,10}, { 79,11}, \
+ { 47,10}, { 95,12}, { 31,11}, { 63,10}, \
More information about the gmp-commit
mailing list