[PATCH] Improve System z support and add some tuning

Torbjorn Granlund tg at gmplib.org
Tue Oct 4 00:53:32 CEST 2011

Awaiting paperwork, I committed basic s390x support to the mainline
repository.  I could do this work after I got an emulator up and running
on a system here.

Change logs:

2011-10-03  Torbjorn Granlund  <tege at gmplib.org>

	* configure.in: Support s390x.

	* longlong.h: Add spport for 64-bit s390x.

	* mpn/s390_64/add_n.asm: New file.
	* mpn/s390_64/sub_n.asm: New file.
	* mpn/s390_64/mul_1.asm: New file.
	* mpn/s390_64/addmul_1.asm: New file.
	* mpn/s390_64/bdiv_dbm1c.asm: New file.
	* mpn/s390_64/gmp-mparam.h: New file, taken from x86_64.

	* mpn/s390_32: Directory renamed from mpn/s390.
	* mpn/s390_32/gmp-mparam.h: New file, taken from x86_64.
	* mpn/s390_32/esame/add_n.asm: New file.
	* mpn/s390_32/esame/sub_n.asm: New file.
	* mpn/s390_32/esame/mul_1.asm: New file.
	* mpn/s390_32/esame/addmul_1.asm: New file.
	* mpn/s390_32/esame/bdiv_dbm1c.asm: New file.

The added assembly code is rudimentary.  It is pointless to write
something more sophisticated without real hardware.  It should
nevertheless give a substantial speedup.

We should enhance this is several ways:

(1) Commit your CPU configuration code, including regenerated
    gmp-mparam.h files.

(2) Write some more crititcal assembly routines, at least submul_1 and
    invert_limb.  (I assume the latter will beat division, that infact
    division instructions should never ever be used, just like on
    x86_64.  But I don't know the quotient time(dlgr)/time(mlgr), which
    basically is what determines this.)

(3) Improve inline assembly 32-bit support for processors with support
    for MLR/ALR/ALCR etc.  I notice that you made an effort along these
    lines, but I am not sure it was done right.  At least, the gcc on my
    Debian system does not define the predef your code relies on.

(4) Should we perhaps use 64-bit limbs for the 31-bit ABI, when using a
    64-bit processor?  As far as I understand, this should work, and it
    would run much faster.  (This would be akin to the N32 MIPS ABI and
    the HPPA 2.0N ABI.)

(I noticed that the gcc compiler port leaves a lot to be desired; it
generates poor code in ways that hurt GMP.  Specifically, it generates
division instructons for X/C for constants C.  By adding a umulditi3
pattern to gcc/config/s390.md, this would be fixed.)


More information about the gmp-devel mailing list