[Gmp-commit] /var/hg/gmp: 2 new changesets
mercurial at gmplib.org
mercurial at gmplib.org
Sun Aug 13 17:27:18 CEST 2023
details: /var/hg/gmp/rev/87c1b0eb021d
changeset: 18428:87c1b0eb021d
user: Torbjorn Granlund <tg at gmplib.org>
date: Sun Aug 13 17:26:30 2023 +0200
description:
Don't use nngrk insn for z13.
details: /var/hg/gmp/rev/b7477feae73c
changeset: 18429:b7477feae73c
user: Torbjorn Granlund <tg at gmplib.org>
date: Sun Aug 13 17:27:16 2023 +0200
description:
More and better s390 READMEs.
diffstat:
mpn/s390_64/README | 8 +++--
mpn/s390_64/z13/README | 64 +++++++++++++++++++++++++++++++++++++++++++++++++
mpn/s390_64/z13/com.asm | 4 +-
mpn/s390_64/z15/README | 33 +++++++++++++++++++++++++
4 files changed, 104 insertions(+), 5 deletions(-)
diffs (137 lines):
diff -r e6a74b5299fe -r b7477feae73c mpn/s390_64/README
--- a/mpn/s390_64/README Sun Aug 13 00:52:13 2023 +0200
+++ b/mpn/s390_64/README Sun Aug 13 17:27:16 2023 +0200
@@ -28,9 +28,11 @@
-There are 5 generations of 64-bit s390 processors, z900, z990, z9,
-z10, and z196. The current GMP code was optimised for the two oldest,
-z900 and z990.
+There are many generations of 64-bit s390 processors, z900, z990, z9, z10,
+z196, z12, z13, z14, z15, and z16. The current GMP code was originally
+optimised for theb two oldest, z900 and z990. Better code for z13 and later
+can be found in aptly named subdirectories. The status comments below are for
+the original code.
mpn_copyi
diff -r e6a74b5299fe -r b7477feae73c mpn/s390_64/z13/README
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/mpn/s390_64/z13/README Sun Aug 13 17:27:16 2023 +0200
@@ -0,0 +1,64 @@
+Copyright 2023 Free Software Foundation, Inc.
+
+This file is part of the GNU MP Library.
+
+The GNU MP Library is free software; you can redistribute it and/or modify
+it under the terms of either:
+
+ * the GNU Lesser General Public License as published by the Free
+ Software Foundation; either version 3 of the License, or (at your
+ option) any later version.
+
+or
+
+ * the GNU General Public License as published by the Free Software
+ Foundation; either version 2 of the License, or (at your option) any
+ later version.
+
+or both in parallel, as here.
+
+The GNU MP Library is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+for more details.
+
+You should have received copies of the GNU General Public License and the
+GNU Lesser General Public License along with the GNU MP Library. If not,
+see https://www.gnu.org/licenses/.
+
+
+
+The code in this directory makes use of vector instructions added with z13.
+These vector instructions are well-designed and naturally much more modern than
+the legacy of S/390 instructions. From GMP's perspective, the full set of
+128-bit addition and subtraction, with register-based carry in and out, are
+very useful. Unfortunately, the multiply support is unexpectedly limited,
+forcing GMP to use the the legacy mlg/mlgr instructions, juggling results
+between plain and vector registers.
+
+
+Torbjörn has faster mul_2 and addmul_2, running at close to 2 cycles/limb on
+z15. That's not a whole lot faster than mul_1 and addmul_1, and as
+sqr_basecase and mul_basecase are based on the _1 variants, we have not
+committed the faster _2 code.
+
+Here is how a new mul_basecase should be organised:
+
+ 1. If the rp pointer is 128-bit aligned, start with mul_2 to keep alignment.
+ Else start with mul_1. Now rp will be 128-bit aligned.
+
+ 2. Loop over addmul_2. Probably don't expand it into 4 variants (addmul_2 is
+ 4-way unrolled) as that practice pays off less with the fewer outer loop
+ iterations which are the result of using addmul_2. Instead, do the n mod
+ 4 handling before each run.
+
+ 3. If there is now anything to do, finish off with an addmul_1.
+
+This means that we will sometimes both do a mul_1 first and an addmul_1 last,
+even if bn mod 2 == 0. It is expected that that will be beneficial,
+considering the alignment penalty for the 128-operations and the fact that the
+_2 functions are not dramatically faster then the _1 functions.
+
+A new sqr_basecase should use addmul_2 too. Here, we might get significant
+improvements as the branch predictor performs abysmally given the structure of
+sqr_basecase; an addmul_2 based variant cuts the number of branches in half.
diff -r e6a74b5299fe -r b7477feae73c mpn/s390_64/z13/com.asm
--- a/mpn/s390_64/z13/com.asm Sun Aug 13 00:52:13 2023 +0200
+++ b/mpn/s390_64/z13/com.asm Sun Aug 13 17:27:16 2023 +0200
@@ -51,8 +51,8 @@
tmll n, 1
je L(xx0)
-L(xx1): lg %r5, 0(ap)
- nngrk %r5, %r5, %r5
+L(xx1): lghi %r5, -1
+ xg %r5, 0(ap)
stg %r5, 0(rp)
la ap, 8(ap)
la rp, 8(rp)
diff -r e6a74b5299fe -r b7477feae73c mpn/s390_64/z15/README
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/mpn/s390_64/z15/README Sun Aug 13 17:27:16 2023 +0200
@@ -0,0 +1,33 @@
+Copyright 2023 Free Software Foundation, Inc.
+
+This file is part of the GNU MP Library.
+
+The GNU MP Library is free software; you can redistribute it and/or modify
+it under the terms of either:
+
+ * the GNU Lesser General Public License as published by the Free
+ Software Foundation; either version 3 of the License, or (at your
+ option) any later version.
+
+or
+
+ * the GNU General Public License as published by the Free Software
+ Foundation; either version 2 of the License, or (at your option) any
+ later version.
+
+or both in parallel, as here.
+
+The GNU MP Library is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+for more details.
+
+You should have received copies of the GNU General Public License and the
+GNU Lesser General Public License along with the GNU MP Library. If not,
+see https://www.gnu.org/licenses/.
+
+
+
+The code in this directory makes use of z15 features, mainly vler/vster.
+Porting it to z13/z14 would require vler and vster to be replaced by a vl+vpdi
+pair and vpdi+vst, respectively.
More information about the gmp-commit
mailing list