[Gmp-commit] /var/hg/gmp: 3 new changesets

Sat Feb 11 16:13:28 CET 2012

details:   /var/hg/gmp/rev/2c4781bd8e78
changeset: 14622:2c4781bd8e78
user:      Marco Bodrato <bodrato at mail.dm.unipi.it>
date:      Sat Feb 11 16:01:56 2012 +0100
description:
Factorial documentation.

details:   /var/hg/gmp/rev/cd638b86628d
changeset: 14623:cd638b86628d
user:      Marco Bodrato <bodrato at mail.dm.unipi.it>
date:      Sat Feb 11 16:08:53 2012 +0100
description:
Turn to UNLIKELY conditions in ASSERTs.

details:   /var/hg/gmp/rev/10ec5b1e158b
changeset: 14624:10ec5b1e158b
user:      Marco Bodrato <bodrato at mail.dm.unipi.it>
date:      Sat Feb 11 16:13:13 2012 +0100
description:
toom3_itch: support any recursion depth.

diffstat:

 ChangeLog      |   8 +++++
 doc/gmp.texi   |  85 ++++++++++++++++++++++++++++++++++++++++-----------------
 gmp-impl.h     |  15 ++++++---
 mpz/oddfac_1.c |  25 +++++++++++++++++
 tests/refmpn.c |   6 ++--
 5 files changed, 105 insertions(+), 34 deletions(-)

diffs (238 lines):

diff -r 19d7b40269eb -r 10ec5b1e158b ChangeLog

--- a/ChangeLog	Sat Feb 11 12:51:21 2012 +0100
+++ b/ChangeLog	Sat Feb 11 16:13:13 2012 +0100
@@ -1,3 +1,8 @@
+2012-02-11 Marco Bodrato <bodrato at mail.dm.unipi.it>
+
+	* doc/gmp.texi (Factorial): Shortly describe current algorithm.
+	* gmp-impl.h (ASSERT_ALWAYS): Consider failures UNLIKELY.
+
 2012-02-10  Niels Möller  <nisse at lysator.liu.se>
 
 	* tests/mpz/t-gcd.c (gcdext_valid_p): Enforce sligthly stricter
@@ -9,6 +14,9 @@
 
 2012-02-09 Marco Bodrato <bodrato at mail.dm.unipi.it>
 
+	* gmp-impl.h (mpn_toom3*_itch): Support any recursion depth.
+	* tests/refmpn.c (refmpn_mul): Restore tight allocations.
+
 	* mpz/oddfac_1.c (mpz_oddfac_1): Get ready for n!!
 	* gmp-impl.h (mpz_oddfac_1): Update signature.
 	* mpz/fac_ui.c (mpz_fac_ui): Update call to mpz_oddfac_1.
diff -r 19d7b40269eb -r 10ec5b1e158b doc/gmp.texi
--- a/doc/gmp.texi	Sat Feb 11 12:51:21 2012 +0100
+++ b/doc/gmp.texi	Sat Feb 11 16:13:13 2012 +0100
@@ -8955,42 +8955,75 @@
 @subsection Factorial
 @cindex Factorial algorithm
 
-Factorials are calculated by a combination of removal of twos, powering, and
-binary splitting.  The procedure can be best illustrated with an example,
+Factorials are calculated by a combination of two algorithms. An idea is
+shared among them: to compute the odd part of the factorial; a final step
+takes account of the power of @math{2} term, by shifting.
+
+For small @math{n}, the odd factor of @math{n!} is computed with the simple
+observation that it is equal to the product of all positive odd numbers
+smaller than @math{n} times the odd factor of @m{\lfloor n/2\rfloor!, [n/2]!},
+where @m{\lfloor x\rfloor, [x]} is the integer part of @math{x}, and so on
+recursively. The procedure can be best illustrated with an example,
 
 @quotation
- at math{23! = 1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.16.17.18.19.20.21.22.23}
+ at math{23! = (23.21.19.17.15.13.11.9.7.5.3)(11.9.7.5.3)(5.3)2^{19}}
 @end quotation
 
- at noindent
-has factors of two removed,
-
- at quotation
- at math{23! = 2^{19}.1.1.3.1.5.3.7.1.9.5.11.3.13.7.15.1.17.9.19.5.21.11.23}
- at end quotation
-
- at noindent
-and the resulting terms collected up according to their multiplicity,
-
- at quotation
- at math{23! = 2^{19}.(3.5)^3.(7.9.11)^2.(13.15.17.19.21.23)}
- at end quotation
-
-Each sequence such as @math{13.15.17.19.21.23} is evaluated by splitting into
-every second term, as for instance @math{(13.17.21).(15.19.23)}, and the same
-recursively on each half.  This is implemented iteratively using some bit
-twiddling.
+Current code collects all the factors in a single list, with a loop and no
+recursion, and compute the product, with no special care for repeated chunks.
+
+When @math{n} is larger, computation pass trough prime sieving. An helper
+function is used, as suggested by Peter Luschny:
+ at tex
+$$\mathop{\rm msf}(n) = {n!\over\lfloor n/2\rfloor!^2\cdot2^k} = \prod_{p=3}^{n}
+p^{\mathop{\rm L}(p,n)} $$
+ at end tex
+ at ifnottex
+
+ at example
+                            n
+                          -----
+               n!          | |   L(p,n)
+msf(n) = -------------- =  | |  p
+          [n/2]!^2.2^k     p=3
+ at end example
+ at end ifnottex
+
+Where @math{p} ranges on odd prime numbers. The exponent @math{k} is chosen to
+obtain an odd integer number: @math{k} is the number of 1 bits in the binary
+representation of @m{\lfloor n/2\rfloor, [n/2]}. The function L at math{(p,n)}
+can be defined as zero when @math{p} is composite, and, for any prime
+ at math{p}, it is computed with:
+ at tex
+$$\mathop{\rm L}(p,n) = \sum_{i>0}\left\lfloor{n\over p^i}\right\rfloor\bmod2
+\leq\log_p(n)$$
+ at end tex
+ at ifnottex
+
+ at example
+          ---
+           \    n                   
+L(p,n) =   /  [---] mod 2   <=  log (n) .
+          ---  p^i                p
+          i>0    
+ at end example
+ at end ifnottex
+
+With this helper function, we are able to compute the odd part of @math{n!}
+using the recursion implied by @m{n!=\lfloor n/2\rfloor!^2\cdot\mathop{\rm
+msf}(n)\cdot2^k , n!=[n/2]!^2*msf(n)*2^k}. The recursion stops using the
+small- at math{n} algorithm on some @m{\lfloor n/2^i\rfloor, [n/2^i]}.
+
+Both the above algorithms use binary splitting to compute the product of many
+small factors. At first as many products as possible are accumulated in a
+single register, generating a list of factors that fit in a machine word. This
+list is then split into halves, and the product is computed recursively.
 
 Such splitting is more efficient than repeated N at cross{}1 multiplies since it
 forms big multiplies, allowing Karatsuba and higher algorithms to be used.
 And even below the Karatsuba threshold a big block of work can be more
 efficient for the basecase algorithm.
 
-Splitting into subsequences of every second term keeps the resulting products
-more nearly equal in size than would the simpler approach of say taking the
-first half and second half of the sequence.  Nearly equal products are more
-efficient for the current multiply implementation.
-
 
 @node Binomial Coefficients Algorithm, Fibonacci Numbers Algorithm, Factorial Algorithm, Other Algorithms
 @subsection Binomial Coefficients
diff -r 19d7b40269eb -r 10ec5b1e158b gmp-impl.h
--- a/gmp-impl.h	Sat Feb 11 12:51:21 2012 +0100
+++ b/gmp-impl.h	Sat Feb 11 16:13:13 2012 +0100
@@ -2225,7 +2225,7 @@
 
 #define ASSERT_ALWAYS(expr)						\
   do {									\
-    if (!(expr))							\
+    if (UNLIKELY (!(expr)))						\
       ASSERT_FAIL (expr);						\
   } while (0)
 
@@ -4875,12 +4875,17 @@
 #define mpn_toom2_sqr_itch(an) \
   (2 * ((an) + GMP_NUMB_BITS))
 
-/* Can probably be trimmed to 2 an + O(log an). */
+/* toom33/toom3: Scratch need is 5an/2 + 10k, k is the recursion depth.
+   We use 3an + C, so that we can use a smaller constant.
+ */
 #define mpn_toom33_mul_itch(an, bn) \
-  ((5 * (an) >> 1) + GMP_NUMB_BITS)
+  (3 * (an) + GMP_NUMB_BITS)
 #define mpn_toom3_sqr_itch(an) \
-  ((5 * (an) >> 1) + GMP_NUMB_BITS)
-
+  (3 * (an) + GMP_NUMB_BITS)
+
+/* toom33/toom3: Scratch need is 8an/3 + 13k, k is the recursion depth.
+   We use 3an + C, so that we can use a smaller constant.
+ */
 #define mpn_toom44_mul_itch(an, bn) \
   (3 * (an) + GMP_NUMB_BITS)
 #define mpn_toom4_sqr_itch(an) \
diff -r 19d7b40269eb -r 10ec5b1e158b mpz/oddfac_1.c
--- a/mpz/oddfac_1.c	Sat Feb 11 12:51:21 2012 +0100
+++ b/mpz/oddfac_1.c	Sat Feb 11 16:13:13 2012 +0100
@@ -314,6 +314,25 @@
   return (CNST_LIMB(1) << (s >> 1)) + (CNST_LIMB(1) << ((s - 1) >> 1));
 }
 
+#if 0
+/* A count-then-exponentiate variant for SWING_A_PRIME */
+#define SWING_A_PRIME(P, N, PR, MAX_PR, VEC, I)		\
+  do {							\
+    mp_limb_t __q, __prime;				\
+    int __exp;						\
+    __prime = (P);					\
+    __exp = 0;						\
+    __q = (N);						\
+    do {						\
+      __q /= __prime;					\
+      __exp += __q & 1;					\
+    } while (__q >= __prime);				\
+    if (__exp) { /* Store $prime^{exp}$ */		\
+      for (__q = __prime; --__exp; __q *= __prime);	\
+      FACTOR_LIST_STORE(__q, PR, MAX_PR, VEC, I);	\
+    };							\
+  } while (0)
+#else
 #define SWING_A_PRIME(P, N, PR, MAX_PR, VEC, I)	\
   do {						\
     mp_limb_t __q, __prime;			\
@@ -325,6 +344,7 @@
       if ((__q & 1) != 0) (PR) *= __prime;	\
     } while (__q >= __prime);			\
   } while (0)
+#endif
 
 #define SH_SWING_A_PRIME(P, N, PR, MAX_PR, VEC, I)	\
   do {							\
@@ -544,6 +564,11 @@
       if (s != 0)
 	/* Use the algorithm described by Peter Luschny in "Divide,
 	   Swing and Conquer the Factorial!".
+
+	   Improvement: there are two temporary buffers, factors and
+	   square, that are never used together; with a good estimate
+	   of the maximal needed size, they could share a single
+	   allocation.
 	*/
 	{
 	  mpz_t mswing;
diff -r 19d7b40269eb -r 10ec5b1e158b tests/refmpn.c
--- a/tests/refmpn.c	Sat Feb 11 12:51:21 2012 +0100
+++ b/tests/refmpn.c	Sat Feb 11 16:13:13 2012 +0100
@@ -1881,21 +1881,21 @@
   if (vn < TOOM4_THRESHOLD)
     {
       /* In the mpn_toom33_mul range, use mpn_toom22_mul.  */
-      tn = 3 * vn + mpn_toom22_mul_itch (vn, vn);
+      tn = 2 * vn + mpn_toom22_mul_itch (vn, vn);
       tp = refmpn_malloc_limbs (tn);
       mpn_toom22_mul (tp, up, vn, vp, vn, tp + 2 * vn);
     }
   else if (vn < FFT_THRESHOLD)
     {
       /* In the mpn_toom44_mul range, use mpn_toom33_mul.  */
-      tn = 3 * vn + mpn_toom33_mul_itch (vn, vn);
+      tn = 2 * vn + mpn_toom33_mul_itch (vn, vn);
       tp = refmpn_malloc_limbs (tn);
       mpn_toom33_mul (tp, up, vn, vp, vn, tp + 2 * vn);
     }
   else
     {
       /* Finally, for the largest operands, use mpn_toom44_mul.  */
-      tn = 3 * vn + mpn_toom44_mul_itch (vn, vn);
+      tn = 2 * vn + mpn_toom44_mul_itch (vn, vn);
       tp = refmpn_malloc_limbs (tn);
       mpn_toom44_mul (tp, up, vn, vp, vn, tp + 2 * vn);
     }