[Gmp-commit] /home/hgfiles/gmp: 13 new changesets
mercurial at gmplib.org
mercurial at gmplib.org
Sat Mar 20 00:29:33 CET 2010
details: /home/hgfiles/gmp/rev/d6373b065f70
changeset: 13501:d6373b065f70
user: Torbjorn Granlund <tege at gmplib.org>
date: Thu Mar 18 00:02:49 2010 +0100
description:
Retune.
details: /home/hgfiles/gmp/rev/6faa971be2d1
changeset: 13502:6faa971be2d1
user: Torbjorn Granlund <tege at gmplib.org>
date: Thu Mar 18 00:12:03 2010 +0100
description:
Misc cleanups. Add/update cycle tables.
details: /home/hgfiles/gmp/rev/a4e154ac6c47
changeset: 13503:a4e154ac6c47
user: Torbjorn Granlund <tege at gmplib.org>
date: Thu Mar 18 00:12:56 2010 +0100
description:
Tune for slightly better speed.
details: /home/hgfiles/gmp/rev/0f4753766fdb
changeset: 13504:0f4753766fdb
user: Torbjorn Granlund <tege at gmplib.org>
date: Fri Mar 19 18:47:18 2010 +0100
description:
Replace ppc64 mpn_addlsh1_n and mpn_sublsh1_n code.
details: /home/hgfiles/gmp/rev/54345d8ce9aa
changeset: 13505:54345d8ce9aa
user: Torbjorn Granlund <tege at gmplib.org>
date: Fri Mar 19 18:49:34 2010 +0100
description:
Change include file order.
details: /home/hgfiles/gmp/rev/57050b6fef2f
changeset: 13506:57050b6fef2f
user: Torbjorn Granlund <tege at gmplib.org>
date: Fri Mar 19 18:53:20 2010 +0100
description:
Define gcc_32_cflags_maybe, ar_32_flags and nm_32_flags.
details: /home/hgfiles/gmp/rev/4bcee6a45442
changeset: 13507:4bcee6a45442
user: Torbjorn Granlund <tege at gmplib.org>
date: Fri Mar 19 20:56:47 2010 +0100
description:
Test mpn_sublsh2_n.
details: /home/hgfiles/gmp/rev/14748e42ea57
changeset: 13508:14748e42ea57
user: Torbjorn Granlund <tege at gmplib.org>
date: Fri Mar 19 21:27:57 2010 +0100
description:
Call mpn_sublsh2_n and mpn_sublsh_n with correct args.
details: /home/hgfiles/gmp/rev/bd713ddbd7b7
changeset: 13509:bd713ddbd7b7
user: Torbjorn Granlund <tege at gmplib.org>
date: Fri Mar 19 21:30:18 2010 +0100
description:
Bring header comments up-to-date.
details: /home/hgfiles/gmp/rev/ba68d5809ad5
changeset: 13510:ba68d5809ad5
user: Torbjorn Granlund <tege at gmplib.org>
date: Fri Mar 19 21:35:25 2010 +0100
description:
Major overhaul of x86_64 divrem_1.
details: /home/hgfiles/gmp/rev/60a81dc0df32
changeset: 13511:60a81dc0df32
user: Torbjorn Granlund <tege at gmplib.org>
date: Fri Mar 19 21:36:18 2010 +0100
description:
Add some comments.
details: /home/hgfiles/gmp/rev/0d11cbcb731f
changeset: 13512:0d11cbcb731f
user: Torbjorn Granlund <tege at gmplib.org>
date: Fri Mar 19 21:39:20 2010 +0100
description:
Add special Nano mpn_divexact_1.
details: /home/hgfiles/gmp/rev/28125fe48a91
changeset: 13513:28125fe48a91
user: Torbjorn Granlund <tege at gmplib.org>
date: Sat Mar 20 00:29:17 2010 +0100
description:
More work on ppc64 add+lsh functions.
diffstat:
ChangeLog | 48 +++++++++
configure.in | 14 +-
mpn/generic/divrem_2.c | 19 +--
mpn/generic/toom_interpolate_6pts.c | 4 +-
mpn/powerpc64/mode64/addlsh1_n.asm | 82 ----------------
mpn/powerpc64/mode64/aorslsh1_n.asm | 52 ++++++++++
mpn/powerpc64/mode64/aorslsh2_n.asm | 52 ++++++++++
mpn/powerpc64/mode64/aorslshC_n.asm | 161 ++++++++++++++++++++++++++++++++
mpn/powerpc64/mode64/sublsh1_n.asm | 83 ----------------
mpn/x86/k6/gmp-mparam.h | 4 +-
mpn/x86/pentium4/sse2/add_n.asm | 50 ++++-----
mpn/x86/pentium4/sse2/addlsh1_n.asm | 41 ++++----
mpn/x86/pentium4/sse2/sub_n.asm | 49 ++++-----
mpn/x86_64/copyd.asm | 8 +-
mpn/x86_64/copyi.asm | 12 +-
mpn/x86_64/core2/divrem_1.asm | 110 +++-------------------
mpn/x86_64/dive_1.asm | 2 +-
mpn/x86_64/divrem_1.asm | 90 ++++++++----------
mpn/x86_64/mod_1_4.asm | 20 ++--
mpn/x86_64/nano/dive_1.asm | 154 +++++++++++++++++++++++++++++++
mpn/x86_64/rsh1aors_n.asm | 2 +-
tests/devel/try.c | 176 ++++++++++++-----------------------
tests/mpn/t-invert.c | 6 +-
tests/mpn/t-mullo.c | 6 +-
tests/mpn/t-mulmod_bnm1.c | 6 +-
tests/mpn/t-sqrmod_bnm1.c | 6 +-
tests/mpn/toom-shared.h | 6 +-
tests/refmpn.c | 5 +
tests/tests.h | 7 +-
29 files changed, 714 insertions(+), 561 deletions(-)
diffs (truncated from 1872 to 300 lines):
diff -r ad57ab3094a5 -r 28125fe48a91 ChangeLog
--- a/ChangeLog Tue Mar 16 23:38:05 2010 +0100
+++ b/ChangeLog Sat Mar 20 00:29:17 2010 +0100
@@ -1,3 +1,51 @@
+2010-03-20 Torbjorn Granlund <tege at gmplib.org>
+
+ * mpn/powerpc64/mode64/aorslshC_n.asm: New file, generlised from
+ last iteration of aorslsh1_n.asm.
+ * mpn/powerpc64/mode64/aorslsh1_n.asm: Use aorslshC_n.asm.
+ * mpn/powerpc64/mode64/aorslsh1_n.asm: New file, use aorslshC_n.asm.
+
+2010-03-19 Torbjorn Granlund <tege at gmplib.org>
+
+ * mpn/x86_64/nano/dive_1.asm: New file.
+
+ * mpn/x86_64/divrem_1.asm: Avoid shld since it is slow on several CPU
+ types. Unconditionally provide code for normalised and unnormalised
+ divisors. Cleanup labels.
+
+ * mpn/x86_64/core2/divrem_1.asm: Remove special code for normalised
+ divisors. Cleanup labels.
+
+ * mpn/generic/toom_interpolate_6pts.c: Call mpn_sublsh2_n and
+ mpn_sublsh_n with correct args.
+
+ * tests/devel/try.c: Use enum for TYPE_*.
+
+ * tests/devel/try.c: Test mpn_sublsh2_n.
+ * tests/refmpn.c (refmpn_sublsh2_n): New function.
+ * tests/tests.h (refmpn_sublsh2_n): Declare.
+
+ * mpn/powerpc64/mode64/aorslsh1_n.asm: New file, with faster
+ mpn_addlsh1_n and mpn_sublsh1_n.
+ * mpn/powerpc64/mode64/addlsh1_n.asm: Delete.
+ * mpn/powerpc64/mode64/sublsh1_n.asm: Delete.
+
+2010-03-18 Torbjorn Granlund <tege at gmplib.org>
+
+ * configure.in (*-*-aix): Define gcc_32_cflags_maybe, ar_32_flags and
+ nm_32_flags.
+
+ * mpn/x86/pentium4/sse2/addlsh1_n.asm: Tune for slightly better speed.
+ Misc cleanups. Add cycle table.
+
+ * mpn/x86_64/copyi.asm: Update cycle table.
+ * mpn/x86_64/copyd.asm: Likewise.
+ * mpn/x86_64/rsh1aors_n.asm: Likewise.
+ * mpn/x86_64/dive_1.asm: Likewise.
+
+ * mpn/x86/pentium4/sse2/add_n.asm: Misc cleanups. Add cycle table.
+ * mpn/x86/pentium4/sse2/sub_n.asm: Likewise.
+
2010-03-16 Torbjorn Granlund <tege at gmplib.org>
* mpn/x86_64/divrem_1.asm: Use mpn_invert_limb instead of div insn.
diff -r ad57ab3094a5 -r 28125fe48a91 configure.in
--- a/configure.in Tue Mar 16 23:38:05 2010 +0100
+++ b/configure.in Sat Mar 20 00:29:17 2010 +0100
@@ -422,7 +422,7 @@
AC_DEFINE(HAVE_HOST_CPU_FAMILY_alpha)
case $host_cpu in
alphaev5* | alphapca5*)
- path="alpha/ev5 alpha" ;;
+ path="alpha/ev5 alpha" ;;
alphaev67 | alphaev68 | alphaev7*)
path="alpha/ev67 alpha/ev6 alpha" ;;
alphaev6)
@@ -937,9 +937,13 @@
case $host in
*-*-aix*)
- cclist="gcc xlc cc"
- xlc_cflags="-O2 -qmaxmem=20000"
- xlc_cflags_optlist="arch"
+ cclist="gcc xlc cc"
+ gcc_32_cflags_maybe="-maix32"
+ xlc_cflags="-O2 -qmaxmem=20000"
+ xlc_cflags_optlist="arch"
+ xlc_32_cflags_maybe="-q32"
+ ar_32_flags="-X32"
+ nm_32_flags="-X32"
# xlc (what version?) knows -qarch=ppc, ppcgr, 601, 602, 603, 604,
# 403, rs64a
@@ -2519,7 +2523,7 @@
toom6h_mul toom6_sqr toom8h_mul toom8_sqr \
toom_couple_handling \
toom2_sqr toom3_sqr toom4_sqr \
- toom_eval_dgr3_pm1 toom_eval_dgr3_pm2 \
+ toom_eval_dgr3_pm1 toom_eval_dgr3_pm2 \
toom_eval_pm1 toom_eval_pm2 toom_eval_pm2exp toom_eval_pm2rexp \
toom_interpolate_5pts toom_interpolate_6pts toom_interpolate_7pts \
toom_interpolate_8pts toom_interpolate_12pts toom_interpolate_16pts \
diff -r ad57ab3094a5 -r 28125fe48a91 mpn/generic/divrem_2.c
--- a/mpn/generic/divrem_2.c Tue Mar 16 23:38:05 2010 +0100
+++ b/mpn/generic/divrem_2.c Sat Mar 20 00:29:17 2010 +0100
@@ -43,21 +43,18 @@
#endif
-/* Divide num (NP/NSIZE) by den (DP/2) and write
- the NSIZE-2 least significant quotient limbs at QP
- and the 2 long remainder at NP. If QEXTRA_LIMBS is
- non-zero, generate that many fraction bits and append them after the
- other quotient limbs.
- Return the most significant limb of the quotient, this is always 0 or 1.
+/* Divide num (NP/NN) by den (DP/2) and write the NN-2 least significant
+ quotient limbs at QP and the 2 long remainder at NP. If qxn is non-zero,
+ generate that many fraction bits and append them after the other quotient
+ limbs. Return the most significant limb of the quotient, this is always 0
+ or 1.
Preconditions:
- 0. NSIZE >= 2.
1. The most significant bit of the divisor must be set.
2. QP must either not overlap with the input operands at all, or
- QP + 2 >= NP must hold true. (This means that it's
- possible to put the quotient in the high part of NUM, right after the
- remainder in NUM.
- 3. NSIZE >= 2, even if QEXTRA_LIMBS is non-zero. */
+ QP + 2 >= NP must hold true. (This means that it's possible to put
+ the quotient in the high part of NUM, right after the remainder in NUM.
+ 3. NN >= 2, even if qxn is non-zero. */
mp_limb_t
mpn_divrem_2 (mp_ptr qp, mp_size_t qxn,
diff -r ad57ab3094a5 -r 28125fe48a91 mpn/generic/toom_interpolate_6pts.c
--- a/mpn/generic/toom_interpolate_6pts.c Tue Mar 16 23:38:05 2010 +0100
+++ b/mpn/generic/toom_interpolate_6pts.c Sat Mar 20 00:29:17 2010 +0100
@@ -169,9 +169,9 @@
/* W2 -= W0<<2 */
#if HAVE_NATIVE_mpn_sublsh_n || HAVE_NATIVE_mpn_sublsh2_n
#if HAVE_NATIVE_mpn_sublsh2_n
- cy = mpn_sublsh2_n(w2, w0, w0n);
+ cy = mpn_sublsh2_n(w2, w2, w0, w0n);
#else
- cy = mpn_sublsh_n(w2, w0, w0n, 2);
+ cy = mpn_sublsh_n(w2, w2, w0, w0n, 2);
#endif
#else
/* {W4,2*n+1} is now free and can be overwritten. */
diff -r ad57ab3094a5 -r 28125fe48a91 mpn/powerpc64/mode64/addlsh1_n.asm
--- a/mpn/powerpc64/mode64/addlsh1_n.asm Tue Mar 16 23:38:05 2010 +0100
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,82 +0,0 @@
-dnl PowerPC-64 mpn_addlsh1_n -- rp[] = up[] + (vp[] << 1)
-
-dnl Copyright 2003, 2005 Free Software Foundation, Inc.
-
-dnl This file is part of the GNU MP Library.
-
-dnl The GNU MP Library is free software; you can redistribute it and/or modify
-dnl it under the terms of the GNU Lesser General Public License as published
-dnl by the Free Software Foundation; either version 3 of the License, or (at
-dnl your option) any later version.
-
-dnl The GNU MP Library is distributed in the hope that it will be useful, but
-dnl WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
-dnl or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public
-dnl License for more details.
-
-dnl You should have received a copy of the GNU Lesser General Public License
-dnl along with the GNU MP Library. If not, see http://www.gnu.org/licenses/.
-
-include(`../config.m4')
-
-C cycles/limb
-C POWER3/PPC630: 2 (1.5 c/l should be possible)
-C POWER4/PPC970: 4 (2.0 c/l should be possible)
-
-C INPUT PARAMETERS
-C rp r3
-C up r4
-C vp r5
-C n r6
-
-define(`rp',`r3')
-define(`up',`r4')
-define(`vp',`r5')
-
-define(`s0',`r6')
-define(`s1',`r7')
-define(`u0',`r8')
-define(`v0',`r10')
-define(`v1',`r11')
-
-ASM_START()
-PROLOGUE(mpn_addlsh1_n)
- mtctr r6 C copy n in ctr
- addic r31, r31, 0 C clear cy
-
- ld v0, 0(vp) C load v limb
- ld u0, 0(up) C load u limb
- addi up, up, -8 C update up
- addi rp, rp, -8 C update rp
- sldi s1, v0, 1
- bdz L(end) C If done, skip loop
-
-L(oop): ld v1, 8(vp) C load v limb
- adde s1, s1, u0 C add limbs with cy, set cy
- std s1, 8(rp) C store result limb
- srdi s0, v0, 63 C shift down previous v limb
- ldu u0, 16(up) C load u limb and update up
- rldimi s0, v1, 1, 0 C left shift v limb and merge with prev v limb
-
- bdz L(exit) C decrement ctr and exit if done
-
- ldu v0, 16(vp) C load v limb and update vp
- adde s0, s0, u0 C add limbs with cy, set cy
- stdu s0, 16(rp) C store result limb and update rp
- srdi s1, v1, 63 C shift down previous v limb
- ld u0, 8(up) C load u limb
- rldimi s1, v0, 1, 0 C left shift v limb and merge with prev v limb
-
- bdnz L(oop) C decrement ctr and loop back
-
-L(end): adde r7, s1, u0
- std r7, 8(rp) C store last result limb
- srdi r3, v0, 63
- addze r3, r3
- blr
-L(exit): adde r7, s0, u0
- std r7, 16(rp) C store last result limb
- srdi r3, v1, 63
- addze r3, r3
- blr
-EPILOGUE()
diff -r ad57ab3094a5 -r 28125fe48a91 mpn/powerpc64/mode64/aorslsh1_n.asm
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/mpn/powerpc64/mode64/aorslsh1_n.asm Sat Mar 20 00:29:17 2010 +0100
@@ -0,0 +1,52 @@
+dnl PowerPC-64 mpn_addlsh1_n and mpn_sublsh1_n.
+
+dnl Copyright 2003, 2005, 2009, 2010 Free Software Foundation, Inc.
+
+dnl This file is part of the GNU MP Library.
+
+dnl The GNU MP Library is free software; you can redistribute it and/or modify
+dnl it under the terms of the GNU Lesser General Public License as published
+dnl by the Free Software Foundation; either version 2.1 of the License, or (at
+dnl your option) any later version.
+
+dnl The GNU MP Library is distributed in the hope that it will be useful, but
+dnl WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+dnl or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public
+dnl License for more details.
+
+dnl You should have received a copy of the GNU Lesser General Public License
+dnl along with the GNU MP Library; see the file COPYING.LIB. If not, write to
+dnl the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston,
+dnl MA 02111-1307, USA.
+
+include(`../config.m4')
+
+C cycles/limb
+C POWER3/PPC630: 1.75 (1.5 c/l should be possible)
+C POWER4/PPC970: 2 (2.0 c/l should be possible)
+C POWER5: ?
+
+
+define(LSH, 1)
+define(RSH, 63)
+
+ifdef(`OPERATION_addlsh1_n',`
+ define(ADDSUBC, addc)
+ define(ADDSUBE, adde)
+ define(INITCY, `addic $1, r1, 0')
+ define(RETVAL, `addze r3, $1')
+ define(func, mpn_addlsh1_n)
+')
+ifdef(`OPERATION_sublsh1_n',`
+ define(ADDSUBC, subfc)
+ define(ADDSUBE, subfe)
+ define(INITCY, `addic $1, r1, -1')
+ define(RETVAL, `subfze r3, $1
+ neg r3, r3')
+ define(func, mpn_sublsh1_n)
+')
+
+
+MULFUNC_PROLOGUE(mpn_addlsh1_n mpn_sublsh1_n)
+
+include_mpn(`powerpc64/mode64/aorslshC_n.asm')
diff -r ad57ab3094a5 -r 28125fe48a91 mpn/powerpc64/mode64/aorslsh2_n.asm
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/mpn/powerpc64/mode64/aorslsh2_n.asm Sat Mar 20 00:29:17 2010 +0100
@@ -0,0 +1,52 @@
+dnl PowerPC-64 mpn_addlsh2_n and mpn_sublsh2_n.
+
+dnl Copyright 2003, 2005, 2009, 2010 Free Software Foundation, Inc.
+
+dnl This file is part of the GNU MP Library.
+
+dnl The GNU MP Library is free software; you can redistribute it and/or modify
+dnl it under the terms of the GNU Lesser General Public License as published
+dnl by the Free Software Foundation; either version 2.1 of the License, or (at
+dnl your option) any later version.
+
+dnl The GNU MP Library is distributed in the hope that it will be useful, but
+dnl WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
More information about the gmp-commit
mailing list