[Gmp-commit] /var/hg/gmp: 2 new changesets

mercurial at gmplib.org mercurial at gmplib.org
Sun Aug 13 17:27:18 CEST 2023


details:   /var/hg/gmp/rev/87c1b0eb021d
changeset: 18428:87c1b0eb021d
user:      Torbjorn Granlund <tg at gmplib.org>
date:      Sun Aug 13 17:26:30 2023 +0200
description:
Don't use nngrk insn for z13.

details:   /var/hg/gmp/rev/b7477feae73c
changeset: 18429:b7477feae73c
user:      Torbjorn Granlund <tg at gmplib.org>
date:      Sun Aug 13 17:27:16 2023 +0200
description:
More and better s390 READMEs.

diffstat:

 mpn/s390_64/README      |   8 +++--
 mpn/s390_64/z13/README  |  64 +++++++++++++++++++++++++++++++++++++++++++++++++
 mpn/s390_64/z13/com.asm |   4 +-
 mpn/s390_64/z15/README  |  33 +++++++++++++++++++++++++
 4 files changed, 104 insertions(+), 5 deletions(-)

diffs (137 lines):

diff -r e6a74b5299fe -r b7477feae73c mpn/s390_64/README
--- a/mpn/s390_64/README	Sun Aug 13 00:52:13 2023 +0200
+++ b/mpn/s390_64/README	Sun Aug 13 17:27:16 2023 +0200
@@ -28,9 +28,11 @@
 
 
 
-There are 5 generations of 64-bit s390 processors, z900, z990, z9,
-z10, and z196.  The current GMP code was optimised for the two oldest,
-z900 and z990.
+There are many generations of 64-bit s390 processors, z900, z990, z9, z10,
+z196, z12, z13, z14, z15, and z16.  The current GMP code was originally
+optimised for theb two oldest, z900 and z990.  Better code for z13 and later
+can be found in aptly named subdirectories.  The status comments below are for
+the original code.
 
 
 mpn_copyi
diff -r e6a74b5299fe -r b7477feae73c mpn/s390_64/z13/README
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/mpn/s390_64/z13/README	Sun Aug 13 17:27:16 2023 +0200
@@ -0,0 +1,64 @@
+Copyright 2023 Free Software Foundation, Inc.
+
+This file is part of the GNU MP Library.
+
+The GNU MP Library is free software; you can redistribute it and/or modify
+it under the terms of either:
+
+  * the GNU Lesser General Public License as published by the Free
+    Software Foundation; either version 3 of the License, or (at your
+    option) any later version.
+
+or
+
+  * the GNU General Public License as published by the Free Software
+    Foundation; either version 2 of the License, or (at your option) any
+    later version.
+
+or both in parallel, as here.
+
+The GNU MP Library is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received copies of the GNU General Public License and the
+GNU Lesser General Public License along with the GNU MP Library.  If not,
+see https://www.gnu.org/licenses/.
+
+
+
+The code in this directory makes use of vector instructions added with z13.
+These vector instructions are well-designed and naturally much more modern than
+the legacy of S/390 instructions.  From GMP's perspective, the full set of
+128-bit addition and subtraction, with register-based carry in and out, are
+very useful.  Unfortunately, the multiply support is unexpectedly limited,
+forcing GMP to use the the legacy mlg/mlgr instructions, juggling results
+between plain and vector registers.
+
+
+Torbjörn has faster mul_2 and addmul_2, running at close to 2 cycles/limb on
+z15.  That's not a whole lot faster than mul_1 and addmul_1, and as
+sqr_basecase and mul_basecase are based on the _1 variants, we have not
+committed the faster _2 code.
+
+Here is how a new mul_basecase should be organised:
+
+  1. If the rp pointer is 128-bit aligned, start with mul_2 to keep alignment.
+     Else start with mul_1.  Now rp will be 128-bit aligned.
+
+  2. Loop over addmul_2.  Probably don't expand it into 4 variants (addmul_2 is
+     4-way unrolled) as that practice pays off less with the fewer outer loop
+     iterations which are the result of using addmul_2.  Instead, do the n mod
+     4 handling before each run.
+
+  3. If there is now anything to do, finish off with an addmul_1.
+
+This means that we will sometimes both do a mul_1 first and an addmul_1 last,
+even if bn mod 2 == 0.  It is expected that that will be beneficial,
+considering the alignment penalty for the 128-operations and the fact that the
+_2 functions are not dramatically faster then the _1 functions.
+
+A new sqr_basecase should use addmul_2 too.  Here, we might get significant
+improvements as the branch predictor performs abysmally given the structure of
+sqr_basecase; an addmul_2 based variant cuts the number of branches in half.
diff -r e6a74b5299fe -r b7477feae73c mpn/s390_64/z13/com.asm
--- a/mpn/s390_64/z13/com.asm	Sun Aug 13 00:52:13 2023 +0200
+++ b/mpn/s390_64/z13/com.asm	Sun Aug 13 17:27:16 2023 +0200
@@ -51,8 +51,8 @@
 
 	tmll	n, 1
 	je	L(xx0)
-L(xx1):	lg	%r5, 0(ap)
-	nngrk	%r5, %r5, %r5
+L(xx1):	lghi	%r5, -1
+	xg	%r5, 0(ap)
 	stg	%r5, 0(rp)
 	la	ap, 8(ap)
 	la	rp, 8(rp)
diff -r e6a74b5299fe -r b7477feae73c mpn/s390_64/z15/README
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/mpn/s390_64/z15/README	Sun Aug 13 17:27:16 2023 +0200
@@ -0,0 +1,33 @@
+Copyright 2023 Free Software Foundation, Inc.
+
+This file is part of the GNU MP Library.
+
+The GNU MP Library is free software; you can redistribute it and/or modify
+it under the terms of either:
+
+  * the GNU Lesser General Public License as published by the Free
+    Software Foundation; either version 3 of the License, or (at your
+    option) any later version.
+
+or
+
+  * the GNU General Public License as published by the Free Software
+    Foundation; either version 2 of the License, or (at your option) any
+    later version.
+
+or both in parallel, as here.
+
+The GNU MP Library is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received copies of the GNU General Public License and the
+GNU Lesser General Public License along with the GNU MP Library.  If not,
+see https://www.gnu.org/licenses/.
+
+
+
+The code in this directory makes use of z15 features, mainly vler/vster.
+Porting it to z13/z14 would require vler and vster to be replaced by a vl+vpdi
+pair and vpdi+vst, respectively.


More information about the gmp-commit mailing list