[PATCH] Improve System z support and add some tuning
tg at gmplib.org
Tue Oct 4 00:53:32 CEST 2011
Awaiting paperwork, I committed basic s390x support to the mainline
repository. I could do this work after I got an emulator up and running
on a system here.
2011-10-03 Torbjorn Granlund <tege at gmplib.org>
* configure.in: Support s390x.
* longlong.h: Add spport for 64-bit s390x.
* mpn/s390_64/add_n.asm: New file.
* mpn/s390_64/sub_n.asm: New file.
* mpn/s390_64/mul_1.asm: New file.
* mpn/s390_64/addmul_1.asm: New file.
* mpn/s390_64/bdiv_dbm1c.asm: New file.
* mpn/s390_64/gmp-mparam.h: New file, taken from x86_64.
* mpn/s390_32: Directory renamed from mpn/s390.
* mpn/s390_32/gmp-mparam.h: New file, taken from x86_64.
* mpn/s390_32/esame/add_n.asm: New file.
* mpn/s390_32/esame/sub_n.asm: New file.
* mpn/s390_32/esame/mul_1.asm: New file.
* mpn/s390_32/esame/addmul_1.asm: New file.
* mpn/s390_32/esame/bdiv_dbm1c.asm: New file.
The added assembly code is rudimentary. It is pointless to write
something more sophisticated without real hardware. It should
nevertheless give a substantial speedup.
We should enhance this is several ways:
(1) Commit your CPU configuration code, including regenerated
(2) Write some more crititcal assembly routines, at least submul_1 and
invert_limb. (I assume the latter will beat division, that infact
division instructions should never ever be used, just like on
x86_64. But I don't know the quotient time(dlgr)/time(mlgr), which
basically is what determines this.)
(3) Improve inline assembly 32-bit support for processors with support
for MLR/ALR/ALCR etc. I notice that you made an effort along these
lines, but I am not sure it was done right. At least, the gcc on my
Debian system does not define the predef your code relies on.
(4) Should we perhaps use 64-bit limbs for the 31-bit ABI, when using a
64-bit processor? As far as I understand, this should work, and it
would run much faster. (This would be akin to the N32 MIPS ABI and
the HPPA 2.0N ABI.)
(I noticed that the gcc compiler port leaves a lot to be desired; it
generates poor code in ways that hurt GMP. Specifically, it generates
division instructons for X/C for constants C. By adding a umulditi3
pattern to gcc/config/s390.md, this would be fixed.)
More information about the gmp-devel