[Gmp-commit] /home/hgfiles/gmp: 12 new changesets
mercurial at gmplib.org
mercurial at gmplib.org
Mon Dec 14 17:32:11 CET 2009
details: /home/hgfiles/gmp/rev/5d53bb4ba26f
changeset: 13063:5d53bb4ba26f
user: Marco Bodrato <bodrato at mail.dm.unipi.it>
date: Sat Dec 12 21:59:46 2009 +0100
description:
Rewrite of mpn_invert, use Newton iterations and mulmod_bnm1.
details: /home/hgfiles/gmp/rev/2af3577bf135
changeset: 13064:2af3577bf135
user: Marco Bodrato <bodrato at mail.dm.unipi.it>
date: Sun Dec 13 09:38:43 2009 +0100
description:
invert: moved ALLOCs, and enabled tuning.
details: /home/hgfiles/gmp/rev/1e3222290b38
changeset: 13065:1e3222290b38
user: Marco Bodrato <bodrato at mail.dm.unipi.it>
date: Sun Dec 13 11:00:54 2009 +0100
description:
invertappr: tuning.
details: /home/hgfiles/gmp/rev/ed0154450fb7
changeset: 13066:ed0154450fb7
user: Marco Bodrato <bodrato at mail.dm.unipi.it>
date: Sun Dec 13 11:07:25 2009 +0100
description:
Tune also INV_APPR_THRESHOLD.
details: /home/hgfiles/gmp/rev/423ca5861db9
changeset: 13067:423ca5861db9
user: Marco Bodrato <bodrato at mail.dm.unipi.it>
date: Sun Dec 13 13:30:45 2009 +0100
description:
Changelog.
details: /home/hgfiles/gmp/rev/525726d74f1a
changeset: 13068:525726d74f1a
user: Marco Bodrato <bodrato at mail.dm.unipi.it>
date: Sun Dec 13 21:43:45 2009 +0100
description:
Split mpn_invertappr in _bc and _ni.
details: /home/hgfiles/gmp/rev/14ab3bd845c3
changeset: 13069:14ab3bd845c3
user: Marco Bodrato <bodrato at mail.dm.unipi.it>
date: Sun Dec 13 22:09:44 2009 +0100
description:
Correct tuning of mpn_*_invertappr
details: /home/hgfiles/gmp/rev/cc1c3e1b2418
changeset: 13070:cc1c3e1b2418
user: Marco Bodrato <bodrato at mail.dm.unipi.it>
date: Mon Dec 14 08:31:59 2009 +0100
description:
Merging offical repo with new invert code.
details: /home/hgfiles/gmp/rev/6ad12e816dd7
changeset: 13071:6ad12e816dd7
user: Marco Bodrato <bodrato at mail.dm.unipi.it>
date: Mon Dec 14 09:19:45 2009 +0100
description:
Re-enable usage of mpn_dcpi1_divappr_q in mpn_invaertappr.
details: /home/hgfiles/gmp/rev/8276e9432ce2
changeset: 13072:8276e9432ce2
user: Marco Bodrato <bodrato at mail.dm.unipi.it>
date: Mon Dec 14 11:03:17 2009 +0100
description:
Comment update on invertappr.
details: /home/hgfiles/gmp/rev/607231d2bd2b
changeset: 13073:607231d2bd2b
user: Marco Bodrato <bodrato at mail.dm.unipi.it>
date: Mon Dec 14 15:44:47 2009 +0100
description:
Possibly exclude using dcpi1_divappr in invertappr, threshold based.
details: /home/hgfiles/gmp/rev/47b27ccf9a34
changeset: 13074:47b27ccf9a34
user: Marco Bodrato <bodrato at mail.dm.unipi.it>
date: Mon Dec 14 17:31:58 2009 +0100
description:
Trivial merge.
diffstat:
ChangeLog | 69 ++++++-
gmp-impl.h | 42 +++
mpn/alpha/ev6/gmp-mparam.h | 5 +-
mpn/arm/gmp-mparam.h | 4 +-
mpn/generic/dcpi1_divappr_q.c | 8 +-
mpn/generic/gcdext.c | 1 +
mpn/generic/invert.c | 375 +++++++++++++++++++++++++++++++--
mpn/generic/mul_fft.c | 13 +-
mpn/generic/sbpi1_div_qr.c | 29 +--
mpn/powerpc32/gmp-mparam.h | 2 +-
mpn/powerpc64/mode64/p4/gmp-mparam.h | 2 +-
mpn/sparc64/gmp-mparam.h | 4 +-
mpn/sparc64/ultrasparc34/gmp-mparam.h | 5 +-
mpn/x86/k7/gmp-mparam.h | 6 +-
mpn/x86/pentium4/sse2/gmp-mparam.h | 10 +-
mpn/x86_64/core2/gmp-mparam.h | 6 +-
mpn/x86_64/corei/gmp-mparam.h | 1 +
mpn/x86_64/gmp-mparam.h | 10 +-
mpn/x86_64/pentium4/gmp-mparam.h | 4 +-
mpz/powm.c | 19 +-
tests/mpn/t-toom33.c | 2 +-
tests/mpn/t-toom42.c | 2 -
tests/mpn/t-toom43.c | 2 -
tests/mpn/t-toom44.c | 2 +-
tests/mpn/toom-shared.h | 4 +-
tune/common.c | 12 +
tune/speed.c | 2 +
tune/speed.h | 72 ++++++
tune/tuneup.c | 34 ++-
29 files changed, 637 insertions(+), 110 deletions(-)
diffs (truncated from 1178 to 300 lines):
diff -r 26537180dd3b -r 47b27ccf9a34 ChangeLog
--- a/ChangeLog Sat Dec 12 12:54:36 2009 +0100
+++ b/ChangeLog Mon Dec 14 17:31:58 2009 +0100
@@ -1,7 +1,74 @@
+2009-12-14 Marco Bodrato <bodrato at mail.dm.unipi.it>
+
+ * mpn/generic/invert.c: Improved comments.
+ (mpn_bc_invertapp): Conditionally re-enable mpn_dcpi1_divappr_q.
+
+2009-12-14 Niels Möller <nisse at lysator.liu.se>
+
+ * gmp-impl.h (udiv_qr_3by2): Fix typo in argument list.
+
+2009-12-13 Niels Möller <nisse at lysator.liu.se>
+
+ * gmp-impl.h (udiv_qr_3by2): New macro.
+ * mpn/generic/sbpi1_div_qr.c (mpn_sbpi1_div_qr): Use udiv_qr_3by2.
+
+2009-12-13 Torbjorn Granlund <tege at gmplib.org>
+
+ * mpn/generic/dcpi1_divappr_q.c (mpn_dcpi1_divappr_q): Avoid a buffer
+ overrun.
+
+ * mpn/generic/mul_fft.c (mpn_mul_fft_full): Handle carry-out from 2nd
+ mpn_mul_fft, add an ASSERT for the 1sd mpn_mul_fft. Replace some
+ comments on cc's range with ASSERTs.
+
+ * mpn/generic/gcdext.c (compute_v): Normalize tp[] after mpn_mul.
+
+ * mpz/powm.c: Rework buffer handling.
+
+2009-12-13 Niels Möller <nisse at lysator.liu.se>
+
+ * tests/mpn/toom-shared.h (main): Use refmpn_mul_basecase to check
+ results (slow!). Iteration counts of all toom tests reduced
+ considerably.
+
+2009-12-13 Marco Bodrato <bodrato at mail.dm.unipi.it>
+
+ * mpn/generic/invert.c (mpn_invertapp): Split in _bc and _ni.
+ (mpn_bc_invertapp): New function, the basecase.
+ (mpn_ni_invertapp): New function, Newton iteration.
+ (mpn_invert): Use mpn_ni_invertapp.
+ * tune/tuneup.c (tune_invert): Min for INV_APPR_THRESHOLD.
+ (tune_invertappr): Min for INV_NEWTON_THRESHOLD.
+
+ * tune/speed.h (SPEED_ROUTINE_MPN_NI_INVERTAPPR): New macro.
+ (speed_mpn_ni_invertappr): New function.
+ * tune/common.c (speed_mpn_ni_invertappr): New function.
+ * tune/speed.c (routine): Add speed_mpn_ni_invertappr.
+
+ * tune/tuneup.c (tune_invertappr): Use speed_mpn_ni_invertappr to
+ tune INV_MULMOD_BNM1_THRESHOLD.
+
2009-12-12 Torbjorn Granlund <tege at gmplib.org>
* mpn/generic/mu_bdiv_qr.c (mpn_mu_bdiv_qr_itch): Rewrite.
+2009-12-12 Marco Bodrato <bodrato at mail.dm.unipi.it>
+
+ * tests/mpn/t-mulmod_bnm1.c (main): Disable B^n+1 stressing test
+ for odd sizes.
+
+ * mpn/generic/invert.c: Complete rewrite. Uses Newton iterations.
+ * gmp-impl.h (mpn_invertappr): Added prototype and name-mangling.
+ (mpn_invertappr_itch): Added prototype and name-mangling.
+ (INV_APPR_THRESHOLD): Support for a new tunable const.
+ * tune/speed.h (SPEED_ROUTINE_MPN_INVERTAPPR): New macro.
+ (speed_mpn_invertappr): New function.
+ * tune/common.c (speed_mpn_invertappr): New function.
+ * tune/speed.c (routine): Add speed_mpn_invertappr.
+ * tune/tuneup.c (tune_invertappr): New function: was tune_invert.
+ (tune_invert): Now tune only INV_APPR_THRESHOLD.
+ (all): Enable call to tune_invert and tune_invertappr.
+
2009-12-11 Torbjorn Granlund <tege at gmplib.org>
* mpn/generic/binvert.c: Use mpn_mulmod_bnm1 instead of FFT wrapping.
@@ -34,7 +101,7 @@
Interface change, since the new code doesn't take a scratch
argument.
- * tests/mpn/t-mulmod_bnm1.c (main): Ensure thatn an >= bn. Lowered
+ * tests/mpn/t-mulmod_bnm1.c (main): Ensure that an >= bn. Lowered
MIN_N to 1. Various fixes to handle n == 1 properly.
* mpn/generic/mulmod_bnm1.c (mpn_mulmod_bnm1): Small interface
diff -r 26537180dd3b -r 47b27ccf9a34 gmp-impl.h
--- a/gmp-impl.h Sat Dec 12 12:54:36 2009 +0100
+++ b/gmp-impl.h Mon Dec 14 17:31:58 2009 +0100
@@ -1172,6 +1172,11 @@
#define mpn_invert_itch __MPN(invert_itch)
__GMP_DECLSPEC mp_size_t mpn_invert_itch __GMP_PROTO ((mp_size_t));
+#define mpn_invertappr __MPN(invertappr)
+__GMP_DECLSPEC mp_limb_t mpn_invertappr __GMP_PROTO ((mp_ptr, mp_srcptr, mp_size_t, mp_ptr));
+#define mpn_invertappr_itch __MPN(invertappr_itch)
+__GMP_DECLSPEC mp_size_t mpn_invertappr_itch __GMP_PROTO ((mp_size_t));
+
#define mpn_binvert __MPN(binvert)
__GMP_DECLSPEC void mpn_binvert __GMP_PROTO ((mp_ptr, mp_srcptr, mp_size_t, mp_ptr));
#define mpn_binvert_itch __MPN(binvert_itch)
@@ -2656,6 +2661,39 @@
(r) = _r; \
} while (0)
+/* Compute quotient the quotient and remainder for n / d. Requires d
+ >= B^2 / 2 and n < d B. di is the inverse
+
+ floor ((B^3 - 1) / (d0 + d1 B)) - B.
+*/
+#define udiv_qr_3by2(q, r1, r0, n2, n1, n0, d1, d0, dinv) \
+ do { \
+ mp_limb_t _q1, _q0, _r1, _r0, _t1, _t0, _mask; \
+ umul_ppmm (_q1, _q0, (n2), (dinv)); \
+ add_ssaaaa (_q1, _q0, _q1, _q0, (n2), (n1)); \
+ \
+ /* Compute the two most significant limbs of n - q'd */ \
+ _r1 = (n1) - _q1 * (d1); \
+ sub_ddmmss (_r1, _r0, _r1, (n0), (d1), (d0)); \
+ umul_ppmm (_t1, _t0, _q1, (d0)); \
+ sub_ddmmss (_r1, _r0, _r1, _r0, _t1, _t0); \
+ _q1++; \
+ \
+ /* Conditionally adjust q and the remainders */ \
+ _mask = - (mp_limb_t) (_r1 >= _q0); \
+ _q1 += _mask; \
+ add_ssaaaa (_r1, _r0, _r1, _r0, _mask & (d1), _mask & (d0)); \
+ if (UNLIKELY (_r1 >= (d1))) \
+ { \
+ if (_r1 > (d1) || _r0 >= (d0)) \
+ { \
+ _q1++; \
+ sub_ddmmss (_r1, _r0, _r1, _r0, (d1), (d0)); \
+ } \
+ } \
+ (q) = _q1; (r1) = _r1; (r0) = _r0; \
+ } while (0)
+
#ifndef mpn_preinv_divrem_1 /* if not done with cpuvec in a fat binary */
#define mpn_preinv_divrem_1 __MPN(preinv_divrem_1)
__GMP_DECLSPEC mp_limb_t mpn_preinv_divrem_1 __GMP_PROTO ((mp_ptr, mp_size_t, mp_srcptr, mp_size_t, mp_limb_t, mp_limb_t, int));
@@ -4174,6 +4212,10 @@
#define INV_NEWTON_THRESHOLD inv_newton_threshold
extern mp_size_t inv_newton_threshold;
+#undef INV_APPR_THRESHOLD
+#define INV_APPR_THRESHOLD inv_appr_threshold
+extern mp_size_t inv_appr_threshold;
+
#undef BINV_MULMOD_BNM1_THRESHOLD
#define BINV_MULMOD_BNM1_THRESHOLD binv_mulmod_bnm1_threshold
extern mp_size_t binv_mulmod_bnm1_threshold;
diff -r 26537180dd3b -r 47b27ccf9a34 mpn/alpha/ev6/gmp-mparam.h
--- a/mpn/alpha/ev6/gmp-mparam.h Sat Dec 12 12:54:36 2009 +0100
+++ b/mpn/alpha/ev6/gmp-mparam.h Mon Dec 14 17:31:58 2009 +0100
@@ -54,10 +54,11 @@
#define DC_DIVAPPR_Q_THRESHOLD 396
#define DC_BDIV_QR_THRESHOLD 110
#define DC_BDIV_Q_THRESHOLD 315
-#define BINV_NEWTON_THRESHOLD 889
+#define BINV_NEWTON_THRESHOLD 1160
+#define BINV_MULMOD_BNM1_THRESHOLD 0 /* always when newton */
#define REDC_1_TO_REDC_N_THRESHOLD 110
-#define MATRIX22_STRASSEN_THRESHOLD 21
+#define MATRIX22_STRASSEN_THRESHOLD 16
#define HGCD_THRESHOLD 276
#define GCD_DC_THRESHOLD 1197
#define GCDEXT_DC_THRESHOLD 799
diff -r 26537180dd3b -r 47b27ccf9a34 mpn/arm/gmp-mparam.h
--- a/mpn/arm/gmp-mparam.h Sat Dec 12 12:54:36 2009 +0100
+++ b/mpn/arm/gmp-mparam.h Mon Dec 14 17:31:58 2009 +0100
@@ -57,8 +57,8 @@
#define MATRIX22_STRASSEN_THRESHOLD 19
#define HGCD_THRESHOLD 110
-#define GCD_DC_THRESHOLD 562
-#define GCDEXT_DC_THRESHOLD 345
+#define GCD_DC_THRESHOLD 680
+#define GCDEXT_DC_THRESHOLD 521
#define JACOBI_BASE_METHOD 2
#define DIVREM_1_NORM_THRESHOLD 0 /* preinv always */
diff -r 26537180dd3b -r 47b27ccf9a34 mpn/generic/dcpi1_divappr_q.c
--- a/mpn/generic/dcpi1_divappr_q.c Sat Dec 12 12:54:36 2009 +0100
+++ b/mpn/generic/dcpi1_divappr_q.c Mon Dec 14 17:31:58 2009 +0100
@@ -162,9 +162,11 @@
qh = mpn_sbpi1_divappr_q (qp, np - dn, nn, dp - dn, dn, dinv->inv32);
else
{
- /* Put quotient in tp, use qp as temporary, since qp lacks a limb. */
- qh = mpn_dcpi1_divappr_q_n (tp, np - qn - 2, dp - (qn + 1), qn + 1, dinv, qp);
- MPN_COPY (qp, tp + 1, qn);
+ /* It is tempting to use qp for recursive scratch and put quotient in
+ tp, but the recursive scratch needs one limb too many. */
+ mp_ptr qp2 = TMP_SALLOC_LIMBS (qn + 1);
+ qh = mpn_dcpi1_divappr_q_n (qp2, np - qn - 2, dp - (qn + 1), qn + 1, dinv, tp);
+ MPN_COPY (qp, qp2 + 1, qn);
}
}
diff -r 26537180dd3b -r 47b27ccf9a34 mpn/generic/gcdext.c
--- a/mpn/generic/gcdext.c Sat Dec 12 12:54:36 2009 +0100
+++ b/mpn/generic/gcdext.c Mon Dec 14 17:31:58 2009 +0100
@@ -118,6 +118,7 @@
mpn_mul (tp, up, size, ap, an);
size += an;
+ size -= tp[size - 1] == 0;
ASSERT (gn <= size);
diff -r 26537180dd3b -r 47b27ccf9a34 mpn/generic/invert.c
--- a/mpn/generic/invert.c Sat Dec 12 12:54:36 2009 +0100
+++ b/mpn/generic/invert.c Mon Dec 14 17:31:58 2009 +0100
@@ -1,6 +1,13 @@
-/* Compute {up,n}^(-1).
+/* Compute {up,n}^(-1). Inversion using ApproximateReciprocal
+ algorithm, returning either the correct result, or one less.
-Copyright (C) 2007 Free Software Foundation, Inc.
+ Contributed to the GNU project by Marco Bodrato.
+
+ THE FUNCTIONS IN THIS FILE ARE INTERNAL WITH MUTABLE INTERFACES. IT IS ONLY
+ SAFE TO REACH THEM THROUGH DOCUMENTED INTERFACES. IN FACT, IT IS ALMOST
+ GUARANTEED THAT THEY WILL CHANGE OR DISAPPEAR IN A FUTURE GMP RELEASE.
+
+Copyright (C) 2007, 2009 Free Software Foundation, Inc.
This file is part of the GNU MP Library.
@@ -17,44 +24,362 @@
You should have received a copy of the GNU Lesser General Public License
along with the GNU MP Library. If not, see http://www.gnu.org/licenses/. */
+#define DEBUG 0
#include <stdlib.h>
#include "gmp.h"
#include "gmp-impl.h"
+#include "longlong.h"
-/* Formulas:
- z = 2z-(zz)d
- z = 2z-(zd)z
- z = z(2-zd)
- z = z-z*(zd-1)
- z = z+z*(1-zd)
+#if DEBUG
+#include <stdio.h>
+int count = 0, cc = 0 ;
+#endif
+
+#ifndef INV_MULMOD_BNM1_THRESHOLD
+#define INV_MULMOD_BNM1_THRESHOLD (5*MULMOD_BNM1_THRESHOLD)
+#endif
+
+#ifndef INV_APPR_THRESHOLD
+#define INV_APPR_THRESHOLD (INV_NEWTON_THRESHOLD)
+#endif
+
+/* FIXME: The iterative version splits the operand in two slighty
+ unbalanced parts, the use of log_2 (or counting the bits)
+ underestimate the maximum number of iterations.
*/
+/* This is intended for constant THRESHOLDs only, where the compiler
+ can completely fold the result. */
+#define LOG2C(n) \
+ (((n) >= 0x1) + ((n) >= 0x2) + ((n) >= 0x4) + ((n) >= 0x8) + \
+ ((n) >= 0x10) + ((n) >= 0x20) + ((n) >= 0x40) + ((n) >= 0x80) + \
+ ((n) >= 0x100) + ((n) >= 0x200) + ((n) >= 0x400) + ((n) >= 0x800) + \
+ ((n) >= 0x1000) + ((n) >= 0x2000) + ((n) >= 0x4000) + ((n) >= 0x8000))
+
+#if TUNE_PROGRAM_BUILD
+#define NPOWS \
+ ((sizeof(mp_size_t) > 6 ? 48 : 8*sizeof(mp_size_t)))
+#define MAYBE_dcpi1_divappr 1
+#else
+#define NPOWS \
+ ((sizeof(mp_size_t) > 6 ? 48 : 8*sizeof(mp_size_t)) - LOG2C (INV_NEWTON_THRESHOLD))
+#define MAYBE_dcpi1_divappr \
+ (INV_NEWTON_THRESHOLD < 2 * DC_DIVAPPR_Q_THRESHOLD)
+#endif
+
+mp_size_t
+mpn_invertappr_itch (mp_size_t n)
+{
+ return 3 * n + 2;
+}
+
+/*
+ All the three functions mpn{,_bc,_ni}_invertappr (ip, dp, n, scratch),
+ take the strictly normalised value {dp,n} (i.e. most significant bit
+ must be set) as an input, and compute {ip,n}: the approximate
+ reciprocal of {dp,n}.
+
+ Let e = mpn*_invertappr (ip, dp, n, scratch) be the returned value; the
+ following conditions are satisfied by the output:
+ 0 <= e <= 1;
More information about the gmp-commit
mailing list