[RFC] Add fat binary support for s390x
Marius Hillenbrand
mhillen at linux.ibm.com
Mon Sep 20 08:46:09 UTC 2021
Hello,
This is a (preliminary) implementation of "fat" binary support on s390x.
I'd appreciate your feedback on the approach.
When compiling GMP yourself, configure picks the best implementation for
your target system. In contrast, distributions build binaries to be
compatible with a broad range of systems. On s390x, many today support
z13 (e.g., Ubuntu and RHEL). Though, using two new instructions on z15
could deliver a performance gain with a tiny code change. Fat binary
support could dynamically enable that gain while remaining compatible at
the same time (even for pre-z13 without vector extensions). Of course,
it also makes adopting future optimizations much easier.
My code extends the existing framework for fat binary support with an
implementation for s390x. The function mpn_cpuvec_init detects the
supported CPU vector extensions via getauxval(HW_CAPS) and sets the
cpuvec to point to the best implementation for the given CPU features.
There are three feature levels: (1) "base", scalar code in mpn/s390_64,
(2) z13, using vector extensions for add/carry chain (from my earlier
patch series), (3) z15, reusing code from z13 while using the new
vlerg/vsterg instead of separate vector load/store and reversing limbs.
On s390x, you can query CPU features using the STFL(E) instructions and
via getauxval(HW_CAPS). For vector extensions, the flags in HW_CAPS
imply that the OS supports context-switching the vector registers, in
addition to the bare HW support. Thus, my implementation uses HW_CAPS.
Each "fat" function has a common entry point that dispatches to a
specific implementation via struct cpuvec_t. Initially, the cpuvec
points to stubs that call the initializer function and then defer to the
actual implementation. The entry points and initialization stubs are
generic C code, created with macros (also in fat.c). The resulting
assembly is analoguous to the code in x86.
To compile the library with different architecture target levels,
enabling fat binary support sets -march=z196 as compatibility baseline
and uses C function attributes to set the -march target to z13 and z15
for the code in mpn/s390_64/z13/ and z15/, respectively.
As an alternative, the build system could be modified so that the
sources from different "fat" directories are compiled with different
CFLAGS. Though, that would impact a lot of the common build framework
for just a single architecture. It would probably be worthwhile if other
architectures could also benefit?! Another option would be lowering the
C source to assembly (selecting targets with .machine directives).
What do you think about the approach, in particular to compiling for
different target levels?
Marius
---
configure.ac | 15 ++++
gmp-impl.h | 10 +++
mpn/s390_64/fat/fat.c | 140 +++++++++++++++++++++++++++++++++
mpn/s390_64/z13/addmul_1.c | 7 +-
mpn/s390_64/z13/aormul_2.c | 8 +-
mpn/s390_64/z13/common-vec.h | 21 ++++-
mpn/s390_64/z13/mul_basecase.c | 6 +-
mpn/s390_64/z15/addmul_1.c | 35 +++++++++
mpn/s390_64/z15/mul_1.c | 35 +++++++++
mpn/s390_64/z15/mul_basecase.c | 35 +++++++++
10 files changed, 307 insertions(+), 5 deletions(-)
create mode 100644 mpn/s390_64/fat/fat.c
create mode 100644 mpn/s390_64/z15/addmul_1.c
create mode 100644 mpn/s390_64/z15/mul_1.c
create mode 100644 mpn/s390_64/z15/mul_basecase.c
diff --git a/configure.ac b/configure.ac
index be67915e2..4d5ba6343 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2354,6 +2354,21 @@ case $host in
BMOD_1_TO_MOD_1_THRESHOLD"
fi
;;
+
+ S390X_PATTERN)
+ if test $enable_fat = yes; then
+ abilist="64"
+ abi=64
+ path_64="s390_64 s390_64/fat"
+
+ extra_functions_64="$extra_functions_64 fat"
+ fat_path="s390_64 s390_64/z13 s390_64/z15"
+ fat_functions="addmul_1 mul_1 mul_basecase"
+ gcc_64_cflags_arch="-march=z196"
+
+ fi
+ ;;
esac
diff --git a/gmp-impl.h b/gmp-impl.h
index 66ffbd6cb..52f294fb7 100644
--- a/gmp-impl.h
+++ b/gmp-impl.h
@@ -4699,6 +4699,16 @@ __GMP_DECLSPEC extern struct cpuvec_t __gmpn_cpuvec;
__GMP_DECLSPEC extern int __gmpn_cpuvec_initialized;
#endif /* x86 fat binary */
+#if WANT_FAT_BINARY && HAVE_HOST_CPU_s390_zarch
+struct cpuvec_t {
+ DECL_addmul_1 ((*addmul_1));
+ DECL_mul_1 ((*mul_1));
+ DECL_mul_basecase ((*mul_basecase));
+};
+__GMP_DECLSPEC extern struct cpuvec_t __gmpn_cpuvec;
+__GMP_DECLSPEC extern int __gmpn_cpuvec_initialized;
+#endif /* s390x fat binary */
+
__GMP_DECLSPEC void __gmpn_cpuvec_init (void);
/* Get a threshold "field" from __gmpn_cpuvec, running __gmpn_cpuvec_init()
diff --git a/mpn/s390_64/fat/fat.c b/mpn/s390_64/fat/fat.c
new file mode 100644
index 000000000..ae5252bbe
--- /dev/null
+++ b/mpn/s390_64/fat/fat.c
@@ -0,0 +1,140 @@
+/* s390x fat binary initializers.
+
+ THE FUNCTIONS AND VARIABLES IN THIS FILE ARE FOR INTERNAL USE ONLY.
+ THEY'RE ALMOST CERTAIN TO BE SUBJECT TO INCOMPATIBLE CHANGES OR DISAPPEAR
+ COMPLETELY IN FUTURE GNU MP RELEASES.
+
+Copyright 2021 Free Software Foundation, Inc.
+
+This file is part of the GNU MP Library.
+
+The GNU MP Library is free software; you can redistribute it and/or modify
+it under the terms of either:
+
+ * the GNU Lesser General Public License as published by the Free
+ Software Foundation; either version 3 of the License, or (at your
+ option) any later version.
+
+or
+
+ * the GNU General Public License as published by the Free Software
+ Foundation; either version 2 of the License, or (at your option) any
+ later version.
+
+or both in parallel, as here.
+
+The GNU MP Library is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+for more details.
+
+You should have received copies of the GNU General Public License and the
+GNU Lesser General Public License along with the GNU MP Library. If not,
+see https://www.gnu.org/licenses/. */
+
+#include <stdio.h>
+#include <stdlib.h>
+
+#include "gmp-impl.h"
+
+/* Only use getauxval() when glibc newer than 2.16 */
+#ifdef __GLIBC__
+# include <features.h>
+# if __GLIBC_PREREQ(2, 16)
+# include <sys/auxv.h>
+# define HAVE_GETAUXVAL 1
+# endif
+#endif
+
+#define TRACE(X)
+
+/*
+ * TODO:
+ * - add thresholds for mul
+ */
+
+struct cpuvec_t __gmpn_cpuvec = {
+ __MPN(addmul_1_init),
+ __MPN(mul_1_init),
+ __MPN(mul_basecase_init)
+};
+
+int __gmpn_cpuvec_initialized = 0;
+
+void
+__gmpn_cpuvec_init (void)
+{
+ unsigned long hwcap;
+ struct cpuvec_t decided_cpuvec;
+
+ TRACE (printf ("__gmpn_cpuvec_init:\n"));
+
+ /* set up function pointers in decided_cpuvec to generic variants: */
+ CPUVEC_SETUP_s390_64;
+
+#ifdef HAVE_GETAUXVAL
+ hwcap = getauxval (AT_HWCAP);
+#else
+ TRACE (printf ("getauxval unavailable, falling back to generic code.\n"));
+ hwcap = 0;
+#endif
+
+ if (hwcap & HWCAP_S390_VX)
+ {
+ TRACE (printf ("enabling code for z13\n"));
+ CPUVEC_SETUP_z13;
+
+ if (hwcap & HWCAP_S390_VXRS_EXT2)
+ {
+ TRACE (printf ("enabling code for z15\n"));
+ CPUVEC_SETUP_z15;
+ }
+ }
+
+ ASSERT_CPUVEC (decided_cpuvec);
+ CPUVEC_INSTALL (decided_cpuvec);
+
+ /* indicate that threshold fields are ready */
+ *((volatile int *)&__gmpn_cpuvec_initialized) = 1;
+}
+
+/*
+ * Generate dispatch entry points and initialization functions for all "fat"
+ * functions.
+ */
+
+#define DEF_mul_basecase(NAME) \
+ void __MPN (NAME) (mp_ptr rp, mp_srcptr up, mp_size_t un, mp_srcptr vp, \
+ mp_size_t vn)
+#define PARAMLIST_mul_basecase rp, up, un, vp, vn
+
+#define DEF_addmul_1(NAME) \
+ mp_limb_t __MPN (NAME) (mp_ptr rp, mp_srcptr up, mp_size_t n, mp_limb_t v0)
+#define PARAMLIST_addmul_1 rp, up, n, v0
+
+#define DEF_mul_1(NAME) DEF_addmul_1 (NAME)
+#define PARAMLIST_mul_1 PARAMLIST_addmul_1
+
+#define CREATE_INIT_AND_ENTRY(NAME) \
+ DEF_##NAME (NAME##_init) \
+ { \
+ __gmpn_cpuvec_init (); \
+ return __gmpn_cpuvec.NAME (PARAMLIST_##NAME); \
+ } \
+ \
+ DEF_##NAME (NAME) { return __gmpn_cpuvec.NAME (PARAMLIST_##NAME); }
+
+#define CREATE_INIT_AND_ENTRY_NORET(NAME) \
+ DEF_##NAME (NAME##_init) \
+ { \
+ __gmpn_cpuvec_init (); \
+ __gmpn_cpuvec.NAME (PARAMLIST_##NAME); \
+ } \
+ \
+ DEF_##NAME (NAME) { __gmpn_cpuvec.NAME (PARAMLIST_##NAME); }
+
+CREATE_INIT_AND_ENTRY_NORET (mul_basecase)
+
+CREATE_INIT_AND_ENTRY (addmul_1)
+CREATE_INIT_AND_ENTRY (mul_1)
diff --git a/mpn/s390_64/z13/addmul_1.c b/mpn/s390_64/z13/addmul_1.c
index 022e5edcc..ef09078d0 100644
--- a/mpn/s390_64/z13/addmul_1.c
+++ b/mpn/s390_64/z13/addmul_1.c
@@ -29,6 +29,10 @@ You should have received copies of the GNU General Public License and the
GNU Lesser General Public License along with the GNU MP Library. If not,
see https://www.gnu.org/licenses/. */
+#ifndef MPN_S390_TARGET
+#define MPN_S390_TARGET "arch=z13"
+#endif
+
#include "gmp-impl.h"
#include "s390_64/z13/common-vec.h"
@@ -54,10 +58,11 @@ see https://www.gnu.org/licenses/. */
#ifdef DO_INLINE
static inline mp_limb_t
FUNCNAME (mp_ptr rp, mp_srcptr s1p, mp_size_t n, mp_limb_t s2limb)
- __attribute__ ((always_inline));
+ __attribute__ ((always_inline)) MPN_S390_FUNCTION_ATTRIBUTE_TARGET;
static inline
#endif
+MPN_S390_FUNCTION_ATTRIBUTE_TARGET
mp_limb_t
FUNCNAME (mp_ptr rp, mp_srcptr s1p, mp_size_t n, mp_limb_t s2limb)
{
diff --git a/mpn/s390_64/z13/aormul_2.c b/mpn/s390_64/z13/aormul_2.c
index 9a69fc38e..17de1f208 100644
--- a/mpn/s390_64/z13/aormul_2.c
+++ b/mpn/s390_64/z13/aormul_2.c
@@ -28,8 +28,11 @@ You should have received copies of the GNU General Public License and the
GNU Lesser General Public License along with the GNU MP Library. If not,
see https://www.gnu.org/licenses/. */
-#include "gmp-impl.h"
+#ifndef MPN_S390_TARGET
+#define MPN_S390_TARGET "arch=z13"
+#endif
+#include "gmp-impl.h"
#include "s390_64/z13/common-vec.h"
#undef FUNCNAME
@@ -57,10 +60,11 @@ see https://www.gnu.org/licenses/. */
#ifdef DO_INLINE
static inline mp_limb_t
FUNCNAME (mp_limb_t *rp, const mp_limb_t *up, mp_size_t n, const mp_limb_t *vp)
- __attribute__ ((always_inline));
+ __attribute__ ((always_inline)) MPN_S390_FUNCTION_ATTRIBUTE_TARGET;
static inline
#endif
+MPN_S390_FUNCTION_ATTRIBUTE_TARGET
mp_limb_t
FUNCNAME (mp_limb_t *rp, const mp_limb_t *up, mp_size_t n,
const mp_limb_t *vp)
diff --git a/mpn/s390_64/z13/common-vec.h b/mpn/s390_64/z13/common-vec.h
index a59e6eefe..1ddee98e2 100644
--- a/mpn/s390_64/z13/common-vec.h
+++ b/mpn/s390_64/z13/common-vec.h
@@ -34,6 +34,13 @@ see https://www.gnu.org/licenses/. */
#include <unistd.h>
#include <vecintrin.h>
+#if WANT_FAT_BINARY
+# define MPN_S390_FUNCTION_ATTRIBUTE_TARGET \
+ __attribute__ ((target (MPN_S390_TARGET)))
+#else
+# define MPN_S390_FUNCTION_ATTRIBUTE_TARGET
+#endif
+
/*
* Vector intrinsics use vector element types that kind-of make sense for the
* specific operation (e.g., vec_permi permutes doublewords). To use VRs
@@ -59,6 +66,7 @@ typedef union vec vec_t;
/*
* single-instruction combine of two GPRs into a VR
*/
+MPN_S390_FUNCTION_ATTRIBUTE_TARGET
static inline v2di
vec_load_2di_as_pair (unsigned long a, unsigned long b)
{
@@ -126,19 +134,24 @@ vec_load_2di_as_pair (unsigned long a, unsigned long b)
* Load a vector register from memory and swap the two 64-bit doubleword
* elements.
*/
+MPN_S390_FUNCTION_ATTRIBUTE_TARGET
static inline vec_t
vec_load_elements_reversed_idx (mp_limb_t const *base, ssize_t const index,
ssize_t const offset)
{
vec_t res;
+#ifdef USE_VLERG
+ res.dw = vec_reve(*(v2di *)((char *)base + index + offset));
+#else
char *ptr = (char *)base;
res.sw = *(v16qi *)(ptr + index + offset);
res.dw = vec_permi (res.dw, res.dw, 2);
-
+#endif
return res;
}
+MPN_S390_FUNCTION_ATTRIBUTE_TARGET
static inline vec_t
vec_load_elements_reversed (mp_limb_t const *base, ssize_t const offset)
{
@@ -149,16 +162,22 @@ vec_load_elements_reversed (mp_limb_t const *base, ssize_t const offset)
* Store a vector register to memory and swap the two 64-bit doubleword
* elements.
*/
+MPN_S390_FUNCTION_ATTRIBUTE_TARGET
static inline void
vec_store_elements_reversed_idx (mp_limb_t *base, ssize_t const index,
ssize_t const offset, vec_t vec)
{
+#ifdef USE_VLERG
+ *(v2di *)((char *)base + index + offset) = vec_reve(vec.dw);
+#else
char *ptr = (char *)base;
vec.dw = vec_permi (vec.dw, vec.dw, 2);
*(v16qi *)(ptr + index + offset) = vec.sw;
+#endif
}
+MPN_S390_FUNCTION_ATTRIBUTE_TARGET
static inline void
vec_store_elements_reversed (mp_limb_t *base, ssize_t const offset, vec_t vec)
{
diff --git a/mpn/s390_64/z13/mul_basecase.c b/mpn/s390_64/z13/mul_basecase.c
index f1b7160b3..1f7c6b8ce 100644
--- a/mpn/s390_64/z13/mul_basecase.c
+++ b/mpn/s390_64/z13/mul_basecase.c
@@ -32,7 +32,9 @@ You should have received copies of the GNU General Public License and the
GNU Lesser General Public License along with the GNU MP Library. If not,
see https://www.gnu.org/licenses/. */
-#include <stdlib.h>
+#ifndef MPN_S390_TARGET
+#define MPN_S390_TARGET "arch=z13"
+#endif
#include "gmp-impl.h"
@@ -48,6 +50,7 @@ see https://www.gnu.org/licenses/. */
*/
#define BRCTG
+
#include "s390_64/z13/common-vec.h"
#define OPERATION_mul_1
@@ -66,6 +69,7 @@ see https://www.gnu.org/licenses/. */
#include "s390_64/z13/aormul_2.c"
#undef OPERATION_addmul_2
+MPN_S390_FUNCTION_ATTRIBUTE_TARGET
void
mpn_mul_basecase (mp_ptr rp, mp_srcptr up, mp_size_t un, mp_srcptr vp,
mp_size_t vn)
diff --git a/mpn/s390_64/z15/addmul_1.c b/mpn/s390_64/z15/addmul_1.c
new file mode 100644
index 000000000..977e232b2
--- /dev/null
+++ b/mpn/s390_64/z15/addmul_1.c
@@ -0,0 +1,35 @@
+/* mul_1 for IBM z15 or later
+
+Copyright 2021 Free Software Foundation, Inc.
+
+This file is part of the GNU MP Library.
+
+The GNU MP Library is free software; you can redistribute it and/or modify
+it under the terms of either:
+
+ * the GNU Lesser General Public License as published by the Free
+ Software Foundation; either version 3 of the License, or (at your
+ option) any later version.
+
+or
+
+ * the GNU General Public License as published by the Free Software
+ Foundation; either version 2 of the License, or (at your option) any
+ later version.
+
+or both in parallel, as here.
+
+The GNU MP Library is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+for more details.
+
+You should have received copies of the GNU General Public License and the
+GNU Lesser General Public License along with the GNU MP Library. If not,
+see https://www.gnu.org/licenses/. */
+
+/* select architecture level z15 */
+#define MPN_S390_TARGET "arch=arch13"
+#define USE_VLERG
+
+#include "s390_64/z13/addmul_1.c"
diff --git a/mpn/s390_64/z15/mul_1.c b/mpn/s390_64/z15/mul_1.c
new file mode 100644
index 000000000..977e232b2
--- /dev/null
+++ b/mpn/s390_64/z15/mul_1.c
@@ -0,0 +1,35 @@
+/* mul_1 for IBM z15 or later
+
+Copyright 2021 Free Software Foundation, Inc.
+
+This file is part of the GNU MP Library.
+
+The GNU MP Library is free software; you can redistribute it and/or modify
+it under the terms of either:
+
+ * the GNU Lesser General Public License as published by the Free
+ Software Foundation; either version 3 of the License, or (at your
+ option) any later version.
+
+or
+
+ * the GNU General Public License as published by the Free Software
+ Foundation; either version 2 of the License, or (at your option) any
+ later version.
+
+or both in parallel, as here.
+
+The GNU MP Library is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+for more details.
+
+You should have received copies of the GNU General Public License and the
+GNU Lesser General Public License along with the GNU MP Library. If not,
+see https://www.gnu.org/licenses/. */
+
+/* select architecture level z15 */
+#define MPN_S390_TARGET "arch=arch13"
+#define USE_VLERG
+
+#include "s390_64/z13/addmul_1.c"
diff --git a/mpn/s390_64/z15/mul_basecase.c b/mpn/s390_64/z15/mul_basecase.c
new file mode 100644
index 000000000..f7b70a72d
--- /dev/null
+++ b/mpn/s390_64/z15/mul_basecase.c
@@ -0,0 +1,35 @@
+/* mul_1 for IBM z15 or later
+
+Copyright 2021 Free Software Foundation, Inc.
+
+This file is part of the GNU MP Library.
+
+The GNU MP Library is free software; you can redistribute it and/or modify
+it under the terms of either:
+
+ * the GNU Lesser General Public License as published by the Free
+ Software Foundation; either version 3 of the License, or (at your
+ option) any later version.
+
+or
+
+ * the GNU General Public License as published by the Free Software
+ Foundation; either version 2 of the License, or (at your option) any
+ later version.
+
+or both in parallel, as here.
+
+The GNU MP Library is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+for more details.
+
+You should have received copies of the GNU General Public License and the
+GNU Lesser General Public License along with the GNU MP Library. If not,
+see https://www.gnu.org/licenses/. */
+
+/* select architecture level z15 */
+#define MPN_S390_TARGET "arch=arch13"
+#define USE_VLERG
+
+#include "s390_64/z13/mul_basecase.c"
--
2.26.2
More information about the gmp-devel
mailing list