[PATCH v2] Add addmul_1, addmul_2, and mul_basecase for IBM z13 and later

Marius Hillenbrand mhillen at linux.ibm.com
Tue Nov 9 15:38:22 CET 2021


I would like to "ping" my patch series for s390x.

Any thoughts or comments?

Would you prefer keeping addmul_1 and addmul_2 as assembly?


On 8/5/21 09:03, Marius Hillenbrand wrote:
> Hi,
> Changes from v1:
>   - add tuneup results from z13
>   - fix mul_basecase to use #included and inlined addmul_2
> Based on your feedback on my previous patches, I rewrote addmul_1/mul_1
> and added implementations for addmul_2/mul_2 and mul_basecase. They are
> still based on multiplying 64x64->128 in gpr pairs and accumulating
> 128-bit-wise in vector registers.
> The code passes "make check", of course, and I have run "try" for ~72
> hours for each of the functions (on top of countless iterations of the
> relevant individual test cases in tests/devel).
> GMPbench.base.multiply improves by about 50% on z15, the overall score
> in GMPbench improves by ~35%. The patches do not include new tuneup
> parameters, yet.
> All the implementations are in C with enough inline assembly to result
> in decent code. mul_basecase #includes and inlines the (add)mul
> functions to avoid calls and unnecessary branches.
> All the (add)mul_1/2 functions are 4x unrolled for the first operand
> (i.e., 4 mults per iteration in addmul_1, 8 mults in addmul_2).
> Mul_basecase is structured so that it branches on (un % 4) to select the
> correct loop prologue only once on entry, and does not need branches for
> that in each body of addmul.
> The accumulation structure in addmul_2 is maybe a little unexpected. The
> idea there is to use 128-bit adds without carry over full adds with
> carry-in and carry-out whenever possible because the latter require two
> instructions for each sum and have instruction grouping limitations. The
> resulting code performs better than strictly using adds with
> carry-in/out for the moderate number of limbs that are relevant for
> mul_basecase.
> Regards,
> Marius
> _______________________________________________
> gmp-devel mailing list
> gmp-devel at gmplib.org
> https://gmplib.org/mailman/listinfo/gmp-devel

Marius Hillenbrand
Linux on Z development
IBM Deutschland Research & Development GmbH
Vors. des Aufsichtsrats: Gregor Pillen / Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht
Stuttgart, HRB 243294

More information about the gmp-devel mailing list