[PATCH 2/4] config.guess, configure.ac: Add detection of IBM z13

Tue Mar 9 22:06:25 UTC 2021

Marius Hillenbrand <mhillen at linux.ibm.com> writes:

  One minor proposal (patch to follow): Some versions of GCC only accept
  the -march=arch<nr> variant of the most recent CPU level they support
  (e.g., GCC-9 accepts -march=arch13 but not -march=z15; 13 as in the 13th
  edition of the Principles of Operations that describe the ISA
  extensions; both parameters are equivalent in later versions of GCC).

Ah, OK.  Will apply!

  > The main change I made to your suggested change is that I added z14 and
  > z15 to the recognised cpu types.  I also made z13 a fallback for z14 (if
  > the latter is not understood by tools), and analogously made z14 and z13
  > fallbacks for z15.

  That absolutely makes sense. When I wrote my patches initially, it was
  not yet clear that it is worthwhile to differentiate.

It is not clear, but it does not hurt to make config.guess be accurate,
and then treat the CPUs the same way.  In my experience, people can get
confiused when GMP claims they have CPU foo-k when they actually have
foo-(k+1).

I tried vlerg on the system here, and it works fine.  Very little timing
differece though, but then again I didn't try very hard.

I am not aware of any timing differences between z13, z14, z15 for the
L1 cache-hit cases.  Are there any?  And the only GMP-relevant ISA
difference of which I am aware is the presence of vlerg in z15.

How's it going with the various addmul_k variants?  My completely
non-scheduled addmul_2 seems to run 37% slower than the mlgr throughput.
That's not bad.  Some fiddling around with the schedule got me to just
25% slower.  That was with 2x unrolling.  I haven't tried anything
sophisticated.

How far is your best addmul_1 from mlgr's throughput?

I believe it to be possible to get pretty close to mlgr's throughput, if
not by any other means by going to addmul_k for k > 2.  I think 8-way
addmul_1 makes little sense, but I think 2-way or 4-way addmul_2, or
2-way addmul_3 or 2-way addmul_4 does make sense if they run close to
mlgr's throughput.

-- 
Torbjörn
Please encrypt, key id 0xC8601622