[PATCH] Add optimized addmul_1 and submul_1 for IBM z13
Torbjörn Granlund
tg at gmplib.org
Mon Mar 1 23:30:38 UTC 2021
Torbjörn Granlund <tg at gmplib.org> writes:
I played a bit with an addmul_1 of my own, with some ideas from your
code. I don't plan to do more work on this. Does this perform well on
hardware?
I now realise that the instruction sequence of my example is essentially
the same as in your code, except with more unrolling. That's a bit
surprising, as I wrote it by understanding the instruction set, starting
at the vac* insn which your code mase me look at. I did actually not
look at your code while writing my code.
--
Torbjörn
Please encrypt, key id 0xC8601622
More information about the gmp-devel
mailing list