[Gmp-commit] /home/hgfiles/gmp: 12 new changesets

Mon Dec 14 17:32:11 CET 2009

details:   /home/hgfiles/gmp/rev/5d53bb4ba26f
changeset: 13063:5d53bb4ba26f
user:      Marco Bodrato <bodrato at mail.dm.unipi.it>
date:      Sat Dec 12 21:59:46 2009 +0100
description:
Rewrite of mpn_invert, use Newton iterations and mulmod_bnm1.

details:   /home/hgfiles/gmp/rev/2af3577bf135
changeset: 13064:2af3577bf135
user:      Marco Bodrato <bodrato at mail.dm.unipi.it>
date:      Sun Dec 13 09:38:43 2009 +0100
description:
invert: moved ALLOCs, and enabled tuning.

details:   /home/hgfiles/gmp/rev/1e3222290b38
changeset: 13065:1e3222290b38
user:      Marco Bodrato <bodrato at mail.dm.unipi.it>
date:      Sun Dec 13 11:00:54 2009 +0100
description:
invertappr: tuning.

details:   /home/hgfiles/gmp/rev/ed0154450fb7
changeset: 13066:ed0154450fb7
user:      Marco Bodrato <bodrato at mail.dm.unipi.it>
date:      Sun Dec 13 11:07:25 2009 +0100
description:
Tune also INV_APPR_THRESHOLD.

details:   /home/hgfiles/gmp/rev/423ca5861db9
changeset: 13067:423ca5861db9
user:      Marco Bodrato <bodrato at mail.dm.unipi.it>
date:      Sun Dec 13 13:30:45 2009 +0100
description:
Changelog.

details:   /home/hgfiles/gmp/rev/525726d74f1a
changeset: 13068:525726d74f1a
user:      Marco Bodrato <bodrato at mail.dm.unipi.it>
date:      Sun Dec 13 21:43:45 2009 +0100
description:
Split mpn_invertappr in _bc and _ni.

details:   /home/hgfiles/gmp/rev/14ab3bd845c3
changeset: 13069:14ab3bd845c3
user:      Marco Bodrato <bodrato at mail.dm.unipi.it>
date:      Sun Dec 13 22:09:44 2009 +0100
description:
Correct tuning of mpn_*_invertappr

details:   /home/hgfiles/gmp/rev/cc1c3e1b2418
changeset: 13070:cc1c3e1b2418
user:      Marco Bodrato <bodrato at mail.dm.unipi.it>
date:      Mon Dec 14 08:31:59 2009 +0100
description:
Merging offical repo with new invert code.

details:   /home/hgfiles/gmp/rev/6ad12e816dd7
changeset: 13071:6ad12e816dd7
user:      Marco Bodrato <bodrato at mail.dm.unipi.it>
date:      Mon Dec 14 09:19:45 2009 +0100
description:
Re-enable usage of mpn_dcpi1_divappr_q in mpn_invaertappr.

details:   /home/hgfiles/gmp/rev/8276e9432ce2
changeset: 13072:8276e9432ce2
user:      Marco Bodrato <bodrato at mail.dm.unipi.it>
date:      Mon Dec 14 11:03:17 2009 +0100
description:
Comment update on invertappr.

details:   /home/hgfiles/gmp/rev/607231d2bd2b
changeset: 13073:607231d2bd2b
user:      Marco Bodrato <bodrato at mail.dm.unipi.it>
date:      Mon Dec 14 15:44:47 2009 +0100
description:
Possibly exclude using dcpi1_divappr in invertappr, threshold based.

details:   /home/hgfiles/gmp/rev/47b27ccf9a34
changeset: 13074:47b27ccf9a34
user:      Marco Bodrato <bodrato at mail.dm.unipi.it>
date:      Mon Dec 14 17:31:58 2009 +0100
description:
Trivial merge.

diffstat:

 ChangeLog                             |   69 ++++++-
 gmp-impl.h                            |   42 +++
 mpn/alpha/ev6/gmp-mparam.h            |    5 +-
 mpn/arm/gmp-mparam.h                  |    4 +-
 mpn/generic/dcpi1_divappr_q.c         |    8 +-
 mpn/generic/gcdext.c                  |    1 +
 mpn/generic/invert.c                  |  375 +++++++++++++++++++++++++++++++--
 mpn/generic/mul_fft.c                 |   13 +-
 mpn/generic/sbpi1_div_qr.c            |   29 +--
 mpn/powerpc32/gmp-mparam.h            |    2 +-
 mpn/powerpc64/mode64/p4/gmp-mparam.h  |    2 +-
 mpn/sparc64/gmp-mparam.h              |    4 +-
 mpn/sparc64/ultrasparc34/gmp-mparam.h |    5 +-
 mpn/x86/k7/gmp-mparam.h               |    6 +-
 mpn/x86/pentium4/sse2/gmp-mparam.h    |   10 +-
 mpn/x86_64/core2/gmp-mparam.h         |    6 +-
 mpn/x86_64/corei/gmp-mparam.h         |    1 +
 mpn/x86_64/gmp-mparam.h               |   10 +-
 mpn/x86_64/pentium4/gmp-mparam.h      |    4 +-
 mpz/powm.c                            |   19 +-
 tests/mpn/t-toom33.c                  |    2 +-
 tests/mpn/t-toom42.c                  |    2 -
 tests/mpn/t-toom43.c                  |    2 -
 tests/mpn/t-toom44.c                  |    2 +-
 tests/mpn/toom-shared.h               |    4 +-
 tune/common.c                         |   12 +
 tune/speed.c                          |    2 +
 tune/speed.h                          |   72 ++++++
 tune/tuneup.c                         |   34 ++-
 29 files changed, 637 insertions(+), 110 deletions(-)

diffs (truncated from 1178 to 300 lines):

diff -r 26537180dd3b -r 47b27ccf9a34 ChangeLog

--- a/ChangeLog	Sat Dec 12 12:54:36 2009 +0100
+++ b/ChangeLog	Mon Dec 14 17:31:58 2009 +0100
@@ -1,7 +1,74 @@
+2009-12-14  Marco Bodrato <bodrato at mail.dm.unipi.it>
+
+	* mpn/generic/invert.c: Improved comments.
+	(mpn_bc_invertapp): Conditionally re-enable mpn_dcpi1_divappr_q.
+
+2009-12-14  Niels Möller  <nisse at lysator.liu.se>
+
+	* gmp-impl.h (udiv_qr_3by2): Fix typo in argument list.
+
+2009-12-13  Niels Möller  <nisse at lysator.liu.se>
+
+	* gmp-impl.h (udiv_qr_3by2): New macro.
+	* mpn/generic/sbpi1_div_qr.c (mpn_sbpi1_div_qr): Use udiv_qr_3by2.
+
+2009-12-13  Torbjorn Granlund  <tege at gmplib.org>
+
+	* mpn/generic/dcpi1_divappr_q.c (mpn_dcpi1_divappr_q): Avoid a buffer
+	overrun.
+
+	* mpn/generic/mul_fft.c (mpn_mul_fft_full): Handle carry-out from 2nd
+	mpn_mul_fft, add an ASSERT for the 1sd mpn_mul_fft.  Replace some
+	comments on cc's range with ASSERTs.
+
+	* mpn/generic/gcdext.c (compute_v): Normalize tp[] after mpn_mul.
+
+	* mpz/powm.c: Rework buffer handling.
+
+2009-12-13  Niels Möller  <nisse at lysator.liu.se>
+
+	* tests/mpn/toom-shared.h (main): Use refmpn_mul_basecase to check
+	results (slow!). Iteration counts of all toom tests reduced
+	considerably.
+
+2009-12-13  Marco Bodrato <bodrato at mail.dm.unipi.it>
+
+	* mpn/generic/invert.c (mpn_invertapp): Split in _bc and _ni.
+	(mpn_bc_invertapp): New function, the basecase.
+	(mpn_ni_invertapp): New function, Newton iteration.
+	(mpn_invert): Use mpn_ni_invertapp.
+	* tune/tuneup.c (tune_invert): Min for INV_APPR_THRESHOLD.
+	(tune_invertappr): Min for INV_NEWTON_THRESHOLD.
+
+	* tune/speed.h (SPEED_ROUTINE_MPN_NI_INVERTAPPR): New macro.
+	(speed_mpn_ni_invertappr): New function.
+	* tune/common.c (speed_mpn_ni_invertappr): New function.
+	* tune/speed.c (routine): Add speed_mpn_ni_invertappr.
+
+	* tune/tuneup.c (tune_invertappr): Use speed_mpn_ni_invertappr to
+	tune INV_MULMOD_BNM1_THRESHOLD.
+
 2009-12-12  Torbjorn Granlund  <tege at gmplib.org>
 
 	* mpn/generic/mu_bdiv_qr.c (mpn_mu_bdiv_qr_itch): Rewrite.
 
+2009-12-12  Marco Bodrato <bodrato at mail.dm.unipi.it>
+
+	* tests/mpn/t-mulmod_bnm1.c (main): Disable B^n+1 stressing test
+	for odd sizes.
+
+	* mpn/generic/invert.c: Complete rewrite. Uses Newton iterations.
+	* gmp-impl.h (mpn_invertappr): Added prototype and name-mangling.
+	(mpn_invertappr_itch): Added prototype and name-mangling.
+	(INV_APPR_THRESHOLD): Support for a new tunable const.
+	* tune/speed.h (SPEED_ROUTINE_MPN_INVERTAPPR): New macro.
+	(speed_mpn_invertappr): New function.
+	* tune/common.c (speed_mpn_invertappr): New function.
+	* tune/speed.c (routine): Add speed_mpn_invertappr.
+	* tune/tuneup.c (tune_invertappr): New function: was tune_invert.
+	(tune_invert): Now tune only INV_APPR_THRESHOLD.
+	(all): Enable call to tune_invert and tune_invertappr.
+
 2009-12-11  Torbjorn Granlund  <tege at gmplib.org>
 
 	* mpn/generic/binvert.c: Use mpn_mulmod_bnm1 instead of FFT wrapping.
@@ -34,7 +101,7 @@
 	Interface change, since the new code doesn't take a scratch
 	argument.
 
-	* tests/mpn/t-mulmod_bnm1.c (main): Ensure thatn an >= bn. Lowered
+	* tests/mpn/t-mulmod_bnm1.c (main): Ensure that an >= bn. Lowered
 	MIN_N to 1. Various fixes to handle n == 1 properly.
 
 	* mpn/generic/mulmod_bnm1.c (mpn_mulmod_bnm1): Small interface
diff -r 26537180dd3b -r 47b27ccf9a34 gmp-impl.h
--- a/gmp-impl.h	Sat Dec 12 12:54:36 2009 +0100
+++ b/gmp-impl.h	Mon Dec 14 17:31:58 2009 +0100
@@ -1172,6 +1172,11 @@
 #define   mpn_invert_itch __MPN(invert_itch)
 __GMP_DECLSPEC mp_size_t mpn_invert_itch __GMP_PROTO ((mp_size_t));
 
+#define   mpn_invertappr __MPN(invertappr)
+__GMP_DECLSPEC mp_limb_t mpn_invertappr __GMP_PROTO ((mp_ptr, mp_srcptr, mp_size_t, mp_ptr));
+#define   mpn_invertappr_itch __MPN(invertappr_itch)
+__GMP_DECLSPEC mp_size_t mpn_invertappr_itch __GMP_PROTO ((mp_size_t));
+
 #define   mpn_binvert __MPN(binvert)
 __GMP_DECLSPEC void      mpn_binvert __GMP_PROTO ((mp_ptr, mp_srcptr, mp_size_t, mp_ptr));
 #define   mpn_binvert_itch __MPN(binvert_itch)
@@ -2656,6 +2661,39 @@
     (r) = _r;								\
   } while (0)
 
+/* Compute quotient the quotient and remainder for n / d. Requires d
+   >= B^2 / 2 and n < d B. di is the inverse
+
+     floor ((B^3 - 1) / (d0 + d1 B)) - B.
+*/
+#define udiv_qr_3by2(q, r1, r0, n2, n1, n0, d1, d0, dinv)		\
+  do {									\
+    mp_limb_t _q1, _q0, _r1, _r0, _t1, _t0, _mask;			\
+    umul_ppmm (_q1, _q0, (n2), (dinv));					\
+    add_ssaaaa (_q1, _q0, _q1, _q0, (n2), (n1));			\
+									\
+    /* Compute the two most significant limbs of n - q'd */		\
+    _r1 = (n1) - _q1 * (d1);						\
+    sub_ddmmss (_r1, _r0, _r1, (n0), (d1), (d0));			\
+    umul_ppmm (_t1, _t0, _q1, (d0));					\
+    sub_ddmmss (_r1, _r0, _r1, _r0, _t1, _t0);				\
+    _q1++;								\
+									\
+    /* Conditionally adjust q and the remainders */			\
+    _mask = - (mp_limb_t) (_r1 >= _q0);					\
+    _q1 += _mask;							\
+    add_ssaaaa (_r1, _r0, _r1, _r0, _mask & (d1), _mask & (d0));	\
+    if (UNLIKELY (_r1 >= (d1)))						\
+      {									\
+	if (_r1 > (d1) || _r0 >= (d0))					\
+	  {								\
+	    _q1++;							\
+	    sub_ddmmss (_r1, _r0, _r1, _r0, (d1), (d0));		\
+	  }								\
+      }									\
+    (q) = _q1; (r1) = _r1; (r0) = _r0;					\
+  } while (0)
+
 #ifndef mpn_preinv_divrem_1  /* if not done with cpuvec in a fat binary */
 #define   mpn_preinv_divrem_1 __MPN(preinv_divrem_1)
 __GMP_DECLSPEC mp_limb_t mpn_preinv_divrem_1 __GMP_PROTO ((mp_ptr, mp_size_t, mp_srcptr, mp_size_t, mp_limb_t, mp_limb_t, int));
@@ -4174,6 +4212,10 @@
 #define INV_NEWTON_THRESHOLD         inv_newton_threshold
 extern mp_size_t                     inv_newton_threshold;
 
+#undef  INV_APPR_THRESHOLD
+#define INV_APPR_THRESHOLD           inv_appr_threshold
+extern mp_size_t                     inv_appr_threshold;
+
 #undef  BINV_MULMOD_BNM1_THRESHOLD
 #define BINV_MULMOD_BNM1_THRESHOLD   binv_mulmod_bnm1_threshold
 extern mp_size_t                     binv_mulmod_bnm1_threshold;
diff -r 26537180dd3b -r 47b27ccf9a34 mpn/alpha/ev6/gmp-mparam.h
--- a/mpn/alpha/ev6/gmp-mparam.h	Sat Dec 12 12:54:36 2009 +0100
+++ b/mpn/alpha/ev6/gmp-mparam.h	Mon Dec 14 17:31:58 2009 +0100
@@ -54,10 +54,11 @@
 #define DC_DIVAPPR_Q_THRESHOLD          396
 #define DC_BDIV_QR_THRESHOLD            110
 #define DC_BDIV_Q_THRESHOLD             315
-#define BINV_NEWTON_THRESHOLD           889
+#define BINV_NEWTON_THRESHOLD          1160
+#define BINV_MULMOD_BNM1_THRESHOLD        0  /* always when newton */
 #define REDC_1_TO_REDC_N_THRESHOLD      110
 
-#define MATRIX22_STRASSEN_THRESHOLD      21
+#define MATRIX22_STRASSEN_THRESHOLD      16
 #define HGCD_THRESHOLD                  276
 #define GCD_DC_THRESHOLD               1197
 #define GCDEXT_DC_THRESHOLD             799
diff -r 26537180dd3b -r 47b27ccf9a34 mpn/arm/gmp-mparam.h
--- a/mpn/arm/gmp-mparam.h	Sat Dec 12 12:54:36 2009 +0100
+++ b/mpn/arm/gmp-mparam.h	Mon Dec 14 17:31:58 2009 +0100
@@ -57,8 +57,8 @@
 
 #define MATRIX22_STRASSEN_THRESHOLD      19
 #define HGCD_THRESHOLD                  110
-#define GCD_DC_THRESHOLD                562
-#define GCDEXT_DC_THRESHOLD             345
+#define GCD_DC_THRESHOLD                680
+#define GCDEXT_DC_THRESHOLD             521
 #define JACOBI_BASE_METHOD                2
 
 #define DIVREM_1_NORM_THRESHOLD           0  /* preinv always */
diff -r 26537180dd3b -r 47b27ccf9a34 mpn/generic/dcpi1_divappr_q.c
--- a/mpn/generic/dcpi1_divappr_q.c	Sat Dec 12 12:54:36 2009 +0100
+++ b/mpn/generic/dcpi1_divappr_q.c	Mon Dec 14 17:31:58 2009 +0100
@@ -162,9 +162,11 @@
 	qh = mpn_sbpi1_divappr_q (qp, np - dn, nn, dp - dn, dn, dinv->inv32);
       else
 	{
-	  /* Put quotient in tp, use qp as temporary, since qp lacks a limb.  */
-	  qh = mpn_dcpi1_divappr_q_n (tp, np - qn - 2, dp - (qn + 1), qn + 1, dinv, qp);
-	  MPN_COPY (qp, tp + 1, qn);
+	  /* It is tempting to use qp for recursive scratch and put quotient in
+	     tp, but the recursive scratch needs one limb too many.  */
+	  mp_ptr qp2 = TMP_SALLOC_LIMBS (qn + 1);
+	  qh = mpn_dcpi1_divappr_q_n (qp2, np - qn - 2, dp - (qn + 1), qn + 1, dinv, tp);
+	  MPN_COPY (qp, qp2 + 1, qn);
 	}
     }
 
diff -r 26537180dd3b -r 47b27ccf9a34 mpn/generic/gcdext.c
--- a/mpn/generic/gcdext.c	Sat Dec 12 12:54:36 2009 +0100
+++ b/mpn/generic/gcdext.c	Mon Dec 14 17:31:58 2009 +0100
@@ -118,6 +118,7 @@
     mpn_mul (tp, up, size, ap, an);
 
   size += an;
+  size -= tp[size - 1] == 0;
 
   ASSERT (gn <= size);
 
diff -r 26537180dd3b -r 47b27ccf9a34 mpn/generic/invert.c
--- a/mpn/generic/invert.c	Sat Dec 12 12:54:36 2009 +0100
+++ b/mpn/generic/invert.c	Mon Dec 14 17:31:58 2009 +0100
@@ -1,6 +1,13 @@
-/* Compute {up,n}^(-1).
+/* Compute {up,n}^(-1). Inversion using ApproximateReciprocal
+   algorithm, returning either the correct result, or one less.
 
-Copyright (C) 2007 Free Software Foundation, Inc.
+   Contributed to the GNU project by Marco Bodrato.
+
+   THE FUNCTIONS IN THIS FILE ARE INTERNAL WITH MUTABLE INTERFACES.  IT IS ONLY
+   SAFE TO REACH THEM THROUGH DOCUMENTED INTERFACES.  IN FACT, IT IS ALMOST
+   GUARANTEED THAT THEY WILL CHANGE OR DISAPPEAR IN A FUTURE GMP RELEASE.
+
+Copyright (C) 2007, 2009 Free Software Foundation, Inc.
 
 This file is part of the GNU MP Library.
 
@@ -17,44 +24,362 @@
 You should have received a copy of the GNU Lesser General Public License
 along with the GNU MP Library.  If not, see http://www.gnu.org/licenses/.  */
 
+#define DEBUG 0
 #include <stdlib.h>
 #include "gmp.h"
 #include "gmp-impl.h"
+#include "longlong.h"
 
-/* Formulas:
-	z = 2z-(zz)d
-	z = 2z-(zd)z
-	z = z(2-zd)
-	z = z-z*(zd-1)
-	z = z+z*(1-zd)
+#if DEBUG
+#include <stdio.h>
+int count = 0, cc = 0 ;
+#endif
+
+#ifndef INV_MULMOD_BNM1_THRESHOLD
+#define INV_MULMOD_BNM1_THRESHOLD (5*MULMOD_BNM1_THRESHOLD)
+#endif
+
+#ifndef INV_APPR_THRESHOLD
+#define INV_APPR_THRESHOLD (INV_NEWTON_THRESHOLD)
+#endif
+
+/* FIXME: The iterative version splits the operand in two slighty
+   unbalanced parts, the use of log_2 (or counting the bits)
+   underestimate the maximum number of iterations.
 */
 
+/* This is intended for constant THRESHOLDs only, where the compiler
+   can completely fold the result.  */
+#define LOG2C(n) \
+ (((n) >=    0x1) + ((n) >=    0x2) + ((n) >=    0x4) + ((n) >=    0x8) + \
+  ((n) >=   0x10) + ((n) >=   0x20) + ((n) >=   0x40) + ((n) >=   0x80) + \
+  ((n) >=  0x100) + ((n) >=  0x200) + ((n) >=  0x400) + ((n) >=  0x800) + \
+  ((n) >= 0x1000) + ((n) >= 0x2000) + ((n) >= 0x4000) + ((n) >= 0x8000))
+
+#if TUNE_PROGRAM_BUILD
+#define NPOWS \
+ ((sizeof(mp_size_t) > 6 ? 48 : 8*sizeof(mp_size_t)))
+#define MAYBE_dcpi1_divappr   1
+#else
+#define NPOWS \
+ ((sizeof(mp_size_t) > 6 ? 48 : 8*sizeof(mp_size_t)) - LOG2C (INV_NEWTON_THRESHOLD))
+#define MAYBE_dcpi1_divappr \
+  (INV_NEWTON_THRESHOLD < 2 * DC_DIVAPPR_Q_THRESHOLD)
+#endif
+
+mp_size_t
+mpn_invertappr_itch (mp_size_t n)
+{
+  return 3 * n + 2;
+}
+
+/*
+ All the three functions mpn{,_bc,_ni}_invertappr (ip, dp, n, scratch),
+ take the strictly normalised value {dp,n} (i.e. most significant bit
+ must be set) as an input, and compute {ip,n}: the approximate
+ reciprocal of {dp,n}.
+
+ Let e = mpn*_invertappr (ip, dp, n, scratch) be the returned value; the
+ following conditions are satisfied by the output:
+   0 <= e <= 1;