[PATCH] Add optimized addmul_1 and submul_1 for IBM z13
tg at gmplib.org
Wed Feb 17 13:26:55 UTC 2021
Thanks for contributing to GMP!
Marius Hillenbrand <mhillen at linux.ibm.com> writes:
These patches add IBM z13 as a new s390_64 CPU level to mpn and add optimized
versions of addmul_1 and submul_1 that exploit the SIMD extensions that were
introduced with the IBM z13 generation. Both implementations share the same
structure and use 128-bit add/subtract ops in vector registers with carry/borrow
bits in registers.
Tested with the regression test suite and stressed with tests/devel/anymul_1.c
Please use tests/devel/try too. It checks for access outside of defined
operands, which is a common bug in GMP asm.
I got started with addmul_1 since it felt challenging and instructive enough,
yet will look into the other ops, as well.
We are careful about not adding asm code which does not give real,
adequate speedup. It has happened in the past that seemingly useful new
instructions do not easily lead to improved performance.
A 2nd goal is to try to approach the theoretical maximum performance,
e.g., to saturate the multiply unit or some other unavoidable part of a
Therefore, I have two questions:
1. What is the measured speed difference of the existing code and the
2. Is the measured performance close to what you would hope for, given
some hard pipeline limits? If not, would you be willing to try to
As a rough measure, if the code is within 20% of theoretical maximum for
the target CPU pipeline, we're happy. If not, more unrolling, better
scheduling, a different instruction choice, might be tried. Code
complexity is also an issue, for sure. But addmul_1 is extremely
important for GMP's performance (in particular in the absence of special
mul_basecase and sqr_basecase) so complexity there is particularly
Your contribution is significant enough to need copyright paperwork from
you and IBM. IBM is FOSS-friendly, so I don't expect any problems, but
it might take a while.
Please encrypt, key id 0xC8601622
More information about the gmp-devel