[Gmp-commit] /home/hgfiles/gmp: 13 new changesets

mercurial at gmplib.org mercurial at gmplib.org
Sat Mar 20 00:29:33 CET 2010


details:   /home/hgfiles/gmp/rev/d6373b065f70
changeset: 13501:d6373b065f70
user:      Torbjorn Granlund <tege at gmplib.org>
date:      Thu Mar 18 00:02:49 2010 +0100
description:
Retune.

details:   /home/hgfiles/gmp/rev/6faa971be2d1
changeset: 13502:6faa971be2d1
user:      Torbjorn Granlund <tege at gmplib.org>
date:      Thu Mar 18 00:12:03 2010 +0100
description:
Misc cleanups.  Add/update cycle tables.

details:   /home/hgfiles/gmp/rev/a4e154ac6c47
changeset: 13503:a4e154ac6c47
user:      Torbjorn Granlund <tege at gmplib.org>
date:      Thu Mar 18 00:12:56 2010 +0100
description:
Tune for slightly better speed.

details:   /home/hgfiles/gmp/rev/0f4753766fdb
changeset: 13504:0f4753766fdb
user:      Torbjorn Granlund <tege at gmplib.org>
date:      Fri Mar 19 18:47:18 2010 +0100
description:
Replace ppc64 mpn_addlsh1_n and mpn_sublsh1_n code.

details:   /home/hgfiles/gmp/rev/54345d8ce9aa
changeset: 13505:54345d8ce9aa
user:      Torbjorn Granlund <tege at gmplib.org>
date:      Fri Mar 19 18:49:34 2010 +0100
description:
Change include file order.

details:   /home/hgfiles/gmp/rev/57050b6fef2f
changeset: 13506:57050b6fef2f
user:      Torbjorn Granlund <tege at gmplib.org>
date:      Fri Mar 19 18:53:20 2010 +0100
description:
Define gcc_32_cflags_maybe, ar_32_flags and nm_32_flags.

details:   /home/hgfiles/gmp/rev/4bcee6a45442
changeset: 13507:4bcee6a45442
user:      Torbjorn Granlund <tege at gmplib.org>
date:      Fri Mar 19 20:56:47 2010 +0100
description:
Test mpn_sublsh2_n.

details:   /home/hgfiles/gmp/rev/14748e42ea57
changeset: 13508:14748e42ea57
user:      Torbjorn Granlund <tege at gmplib.org>
date:      Fri Mar 19 21:27:57 2010 +0100
description:
Call mpn_sublsh2_n and mpn_sublsh_n with correct args.

details:   /home/hgfiles/gmp/rev/bd713ddbd7b7
changeset: 13509:bd713ddbd7b7
user:      Torbjorn Granlund <tege at gmplib.org>
date:      Fri Mar 19 21:30:18 2010 +0100
description:
Bring header comments up-to-date.

details:   /home/hgfiles/gmp/rev/ba68d5809ad5
changeset: 13510:ba68d5809ad5
user:      Torbjorn Granlund <tege at gmplib.org>
date:      Fri Mar 19 21:35:25 2010 +0100
description:
Major overhaul of x86_64 divrem_1.

details:   /home/hgfiles/gmp/rev/60a81dc0df32
changeset: 13511:60a81dc0df32
user:      Torbjorn Granlund <tege at gmplib.org>
date:      Fri Mar 19 21:36:18 2010 +0100
description:
Add some comments.

details:   /home/hgfiles/gmp/rev/0d11cbcb731f
changeset: 13512:0d11cbcb731f
user:      Torbjorn Granlund <tege at gmplib.org>
date:      Fri Mar 19 21:39:20 2010 +0100
description:
Add special Nano mpn_divexact_1.

details:   /home/hgfiles/gmp/rev/28125fe48a91
changeset: 13513:28125fe48a91
user:      Torbjorn Granlund <tege at gmplib.org>
date:      Sat Mar 20 00:29:17 2010 +0100
description:
More work on ppc64 add+lsh functions.

diffstat:

 ChangeLog                           |   48 +++++++++
 configure.in                        |   14 +-
 mpn/generic/divrem_2.c              |   19 +--
 mpn/generic/toom_interpolate_6pts.c |    4 +-
 mpn/powerpc64/mode64/addlsh1_n.asm  |   82 ----------------
 mpn/powerpc64/mode64/aorslsh1_n.asm |   52 ++++++++++
 mpn/powerpc64/mode64/aorslsh2_n.asm |   52 ++++++++++
 mpn/powerpc64/mode64/aorslshC_n.asm |  161 ++++++++++++++++++++++++++++++++
 mpn/powerpc64/mode64/sublsh1_n.asm  |   83 ----------------
 mpn/x86/k6/gmp-mparam.h             |    4 +-
 mpn/x86/pentium4/sse2/add_n.asm     |   50 ++++-----
 mpn/x86/pentium4/sse2/addlsh1_n.asm |   41 ++++----
 mpn/x86/pentium4/sse2/sub_n.asm     |   49 ++++-----
 mpn/x86_64/copyd.asm                |    8 +-
 mpn/x86_64/copyi.asm                |   12 +-
 mpn/x86_64/core2/divrem_1.asm       |  110 +++-------------------
 mpn/x86_64/dive_1.asm               |    2 +-
 mpn/x86_64/divrem_1.asm             |   90 ++++++++----------
 mpn/x86_64/mod_1_4.asm              |   20 ++--
 mpn/x86_64/nano/dive_1.asm          |  154 +++++++++++++++++++++++++++++++
 mpn/x86_64/rsh1aors_n.asm           |    2 +-
 tests/devel/try.c                   |  176 ++++++++++++-----------------------
 tests/mpn/t-invert.c                |    6 +-
 tests/mpn/t-mullo.c                 |    6 +-
 tests/mpn/t-mulmod_bnm1.c           |    6 +-
 tests/mpn/t-sqrmod_bnm1.c           |    6 +-
 tests/mpn/toom-shared.h             |    6 +-
 tests/refmpn.c                      |    5 +
 tests/tests.h                       |    7 +-
 29 files changed, 714 insertions(+), 561 deletions(-)

diffs (truncated from 1872 to 300 lines):

diff -r ad57ab3094a5 -r 28125fe48a91 ChangeLog
--- a/ChangeLog	Tue Mar 16 23:38:05 2010 +0100
+++ b/ChangeLog	Sat Mar 20 00:29:17 2010 +0100
@@ -1,3 +1,51 @@
+2010-03-20  Torbjorn Granlund  <tege at gmplib.org>
+
+	* mpn/powerpc64/mode64/aorslshC_n.asm: New file, generlised from
+	last iteration of aorslsh1_n.asm.
+	* mpn/powerpc64/mode64/aorslsh1_n.asm: Use aorslshC_n.asm.
+	* mpn/powerpc64/mode64/aorslsh1_n.asm: New file, use aorslshC_n.asm.
+
+2010-03-19  Torbjorn Granlund  <tege at gmplib.org>
+
+	* mpn/x86_64/nano/dive_1.asm: New file.
+
+	* mpn/x86_64/divrem_1.asm: Avoid shld since it is slow on several CPU
+	types.  Unconditionally provide code for normalised and unnormalised
+	divisors.  Cleanup labels.
+
+	* mpn/x86_64/core2/divrem_1.asm: Remove special code for normalised
+	divisors.  Cleanup labels.
+
+	* mpn/generic/toom_interpolate_6pts.c: Call mpn_sublsh2_n and
+	mpn_sublsh_n with correct args.
+
+	* tests/devel/try.c: Use enum for TYPE_*.
+
+	* tests/devel/try.c: Test mpn_sublsh2_n.
+	* tests/refmpn.c (refmpn_sublsh2_n): New function.
+	* tests/tests.h (refmpn_sublsh2_n): Declare.
+
+	* mpn/powerpc64/mode64/aorslsh1_n.asm: New file, with faster
+	mpn_addlsh1_n and mpn_sublsh1_n.
+	* mpn/powerpc64/mode64/addlsh1_n.asm: Delete.
+	* mpn/powerpc64/mode64/sublsh1_n.asm: Delete.
+
+2010-03-18  Torbjorn Granlund  <tege at gmplib.org>
+
+	* configure.in (*-*-aix): Define gcc_32_cflags_maybe, ar_32_flags and
+	nm_32_flags.
+
+	* mpn/x86/pentium4/sse2/addlsh1_n.asm: Tune for slightly better speed.
+	Misc cleanups.  Add cycle table.
+
+	* mpn/x86_64/copyi.asm: Update cycle table.
+	* mpn/x86_64/copyd.asm: Likewise.
+	* mpn/x86_64/rsh1aors_n.asm: Likewise.
+	* mpn/x86_64/dive_1.asm: Likewise.
+
+	* mpn/x86/pentium4/sse2/add_n.asm: Misc cleanups.  Add cycle table.
+	* mpn/x86/pentium4/sse2/sub_n.asm: Likewise.
+
 2010-03-16  Torbjorn Granlund  <tege at gmplib.org>
 
 	* mpn/x86_64/divrem_1.asm: Use mpn_invert_limb instead of div insn.
diff -r ad57ab3094a5 -r 28125fe48a91 configure.in
--- a/configure.in	Tue Mar 16 23:38:05 2010 +0100
+++ b/configure.in	Sat Mar 20 00:29:17 2010 +0100
@@ -422,7 +422,7 @@
     AC_DEFINE(HAVE_HOST_CPU_FAMILY_alpha)
     case $host_cpu in
       alphaev5* | alphapca5*)
-      	path="alpha/ev5 alpha" ;;
+	path="alpha/ev5 alpha" ;;
       alphaev67 | alphaev68 | alphaev7*)
         path="alpha/ev67 alpha/ev6 alpha" ;;
       alphaev6)
@@ -937,9 +937,13 @@
 
     case $host in
       *-*-aix*)
-        cclist="gcc xlc cc"
-        xlc_cflags="-O2 -qmaxmem=20000"
-        xlc_cflags_optlist="arch"
+	cclist="gcc xlc cc"
+	gcc_32_cflags_maybe="-maix32"
+	xlc_cflags="-O2 -qmaxmem=20000"
+	xlc_cflags_optlist="arch"
+	xlc_32_cflags_maybe="-q32"
+	ar_32_flags="-X32"
+	nm_32_flags="-X32"
 
         # xlc (what version?) knows -qarch=ppc, ppcgr, 601, 602, 603, 604,
         # 403, rs64a
@@ -2519,7 +2523,7 @@
   toom6h_mul toom6_sqr toom8h_mul toom8_sqr				   \
   toom_couple_handling							   \
   toom2_sqr toom3_sqr toom4_sqr						   \
-  toom_eval_dgr3_pm1 toom_eval_dgr3_pm2 				   \
+  toom_eval_dgr3_pm1 toom_eval_dgr3_pm2					   \
   toom_eval_pm1 toom_eval_pm2 toom_eval_pm2exp toom_eval_pm2rexp	   \
   toom_interpolate_5pts toom_interpolate_6pts toom_interpolate_7pts	   \
   toom_interpolate_8pts toom_interpolate_12pts toom_interpolate_16pts	   \
diff -r ad57ab3094a5 -r 28125fe48a91 mpn/generic/divrem_2.c
--- a/mpn/generic/divrem_2.c	Tue Mar 16 23:38:05 2010 +0100
+++ b/mpn/generic/divrem_2.c	Sat Mar 20 00:29:17 2010 +0100
@@ -43,21 +43,18 @@
 #endif
 
 
-/* Divide num (NP/NSIZE) by den (DP/2) and write
-   the NSIZE-2 least significant quotient limbs at QP
-   and the 2 long remainder at NP.  If QEXTRA_LIMBS is
-   non-zero, generate that many fraction bits and append them after the
-   other quotient limbs.
-   Return the most significant limb of the quotient, this is always 0 or 1.
+/* Divide num (NP/NN) by den (DP/2) and write the NN-2 least significant
+   quotient limbs at QP and the 2 long remainder at NP.  If qxn is non-zero,
+   generate that many fraction bits and append them after the other quotient
+   limbs.  Return the most significant limb of the quotient, this is always 0
+   or 1.
 
    Preconditions:
-   0. NSIZE >= 2.
    1. The most significant bit of the divisor must be set.
    2. QP must either not overlap with the input operands at all, or
-      QP + 2 >= NP must hold true.  (This means that it's
-      possible to put the quotient in the high part of NUM, right after the
-      remainder in NUM.
-   3. NSIZE >= 2, even if QEXTRA_LIMBS is non-zero.  */
+      QP + 2 >= NP must hold true.  (This means that it's possible to put
+      the quotient in the high part of NUM, right after the remainder in NUM.
+   3. NN >= 2, even if qxn is non-zero.  */
 
 mp_limb_t
 mpn_divrem_2 (mp_ptr qp, mp_size_t qxn,
diff -r ad57ab3094a5 -r 28125fe48a91 mpn/generic/toom_interpolate_6pts.c
--- a/mpn/generic/toom_interpolate_6pts.c	Tue Mar 16 23:38:05 2010 +0100
+++ b/mpn/generic/toom_interpolate_6pts.c	Sat Mar 20 00:29:17 2010 +0100
@@ -169,9 +169,9 @@
   /* W2 -= W0<<2 */
 #if HAVE_NATIVE_mpn_sublsh_n || HAVE_NATIVE_mpn_sublsh2_n
 #if HAVE_NATIVE_mpn_sublsh2_n
-  cy = mpn_sublsh2_n(w2, w0, w0n);
+  cy = mpn_sublsh2_n(w2, w2, w0, w0n);
 #else
-  cy = mpn_sublsh_n(w2, w0, w0n, 2);
+  cy = mpn_sublsh_n(w2, w2, w0, w0n, 2);
 #endif
 #else
   /* {W4,2*n+1} is now free and can be overwritten. */
diff -r ad57ab3094a5 -r 28125fe48a91 mpn/powerpc64/mode64/addlsh1_n.asm
--- a/mpn/powerpc64/mode64/addlsh1_n.asm	Tue Mar 16 23:38:05 2010 +0100
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,82 +0,0 @@
-dnl  PowerPC-64 mpn_addlsh1_n -- rp[] = up[] + (vp[] << 1)
-
-dnl  Copyright 2003, 2005 Free Software Foundation, Inc.
-
-dnl  This file is part of the GNU MP Library.
-
-dnl  The GNU MP Library is free software; you can redistribute it and/or modify
-dnl  it under the terms of the GNU Lesser General Public License as published
-dnl  by the Free Software Foundation; either version 3 of the License, or (at
-dnl  your option) any later version.
-
-dnl  The GNU MP Library is distributed in the hope that it will be useful, but
-dnl  WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
-dnl  or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public
-dnl  License for more details.
-
-dnl  You should have received a copy of the GNU Lesser General Public License
-dnl  along with the GNU MP Library.  If not, see http://www.gnu.org/licenses/.
-
-include(`../config.m4')
-
-C		cycles/limb
-C POWER3/PPC630:     2		(1.5 c/l should be possible)
-C POWER4/PPC970:     4		(2.0 c/l should be possible)
-
-C INPUT PARAMETERS
-C rp	r3
-C up	r4
-C vp	r5
-C n	r6
-
-define(`rp',`r3')
-define(`up',`r4')
-define(`vp',`r5')
-
-define(`s0',`r6')
-define(`s1',`r7')
-define(`u0',`r8')
-define(`v0',`r10')
-define(`v1',`r11')
-
-ASM_START()
-PROLOGUE(mpn_addlsh1_n)
-	mtctr	r6		C copy n in ctr
-	addic	r31, r31, 0	C clear cy
-
-	ld	v0, 0(vp)	C load v limb
-	ld	u0, 0(up)	C load u limb
-	addi	up, up, -8	C update up
-	addi	rp, rp, -8	C update rp
-	sldi	s1, v0, 1
-	bdz	L(end)		C If done, skip loop
-
-L(oop):	ld	v1, 8(vp)	C load v limb
-	adde	s1, s1, u0	C add limbs with cy, set cy
-	std	s1, 8(rp)	C store result limb
-	srdi	s0, v0, 63	C shift down previous v limb
-	ldu	u0, 16(up)	C load u limb and update up
-	rldimi	s0, v1, 1, 0	C left shift v limb and merge with prev v limb
-
-	bdz	L(exit)		C decrement ctr and exit if done
-
-	ldu	v0, 16(vp)	C load v limb and update vp
-	adde	s0, s0, u0	C add limbs with cy, set cy
-	stdu	s0, 16(rp)	C store result limb and update rp
-	srdi	s1, v1, 63	C shift down previous v limb
-	ld	u0, 8(up)	C load u limb
-	rldimi	s1, v0, 1, 0	C left shift v limb and merge with prev v limb
-
-	bdnz	L(oop)		C decrement ctr and loop back
-
-L(end):	adde	r7, s1, u0
-	std	r7, 8(rp)	C store last result limb
-	srdi	r3, v0, 63
-	addze	r3, r3
-	blr
-L(exit):	adde	r7, s0, u0
-	std	r7, 16(rp)	C store last result limb
-	srdi	r3, v1, 63
-	addze	r3, r3
-	blr
-EPILOGUE()
diff -r ad57ab3094a5 -r 28125fe48a91 mpn/powerpc64/mode64/aorslsh1_n.asm
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/mpn/powerpc64/mode64/aorslsh1_n.asm	Sat Mar 20 00:29:17 2010 +0100
@@ -0,0 +1,52 @@
+dnl  PowerPC-64 mpn_addlsh1_n and mpn_sublsh1_n.
+
+dnl  Copyright 2003, 2005, 2009, 2010 Free Software Foundation, Inc.
+
+dnl  This file is part of the GNU MP Library.
+
+dnl  The GNU MP Library is free software; you can redistribute it and/or modify
+dnl  it under the terms of the GNU Lesser General Public License as published
+dnl  by the Free Software Foundation; either version 2.1 of the License, or (at
+dnl  your option) any later version.
+
+dnl  The GNU MP Library is distributed in the hope that it will be useful, but
+dnl  WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+dnl  or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public
+dnl  License for more details.
+
+dnl  You should have received a copy of the GNU Lesser General Public License
+dnl  along with the GNU MP Library; see the file COPYING.LIB.  If not, write to
+dnl  the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston,
+dnl  MA 02111-1307, USA.
+
+include(`../config.m4')
+
+C		cycles/limb
+C POWER3/PPC630:     1.75	(1.5 c/l should be possible)
+C POWER4/PPC970:     2		(2.0 c/l should be possible)
+C POWER5:	     ?
+
+
+define(LSH,		1)
+define(RSH,		63)
+
+ifdef(`OPERATION_addlsh1_n',`
+  define(ADDSUBC,	addc)
+  define(ADDSUBE,      	adde)
+  define(INITCY,      	`addic $1, r1, 0')
+  define(RETVAL,      	`addze  r3, $1')
+  define(func, mpn_addlsh1_n)
+')
+ifdef(`OPERATION_sublsh1_n',`
+  define(ADDSUBC,	subfc)
+  define(ADDSUBE,      	subfe)
+  define(INITCY,      	`addic $1, r1, -1')
+  define(RETVAL,      	`subfze  r3, $1
+			neg	r3, r3')
+  define(func, mpn_sublsh1_n)
+')
+
+
+MULFUNC_PROLOGUE(mpn_addlsh1_n mpn_sublsh1_n)
+
+include_mpn(`powerpc64/mode64/aorslshC_n.asm')
diff -r ad57ab3094a5 -r 28125fe48a91 mpn/powerpc64/mode64/aorslsh2_n.asm
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/mpn/powerpc64/mode64/aorslsh2_n.asm	Sat Mar 20 00:29:17 2010 +0100
@@ -0,0 +1,52 @@
+dnl  PowerPC-64 mpn_addlsh2_n and mpn_sublsh2_n.
+
+dnl  Copyright 2003, 2005, 2009, 2010 Free Software Foundation, Inc.
+
+dnl  This file is part of the GNU MP Library.
+
+dnl  The GNU MP Library is free software; you can redistribute it and/or modify
+dnl  it under the terms of the GNU Lesser General Public License as published
+dnl  by the Free Software Foundation; either version 2.1 of the License, or (at
+dnl  your option) any later version.
+
+dnl  The GNU MP Library is distributed in the hope that it will be useful, but
+dnl  WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY


More information about the gmp-commit mailing list