Is asm code selection optimal?

Tue May 16 13:53:29 UTC 2017

For a lot of CPUs, GMP selects a set of assembly files, these are then
used for important GMP inner loops.  Often, the implementation of a
function runs well on several CPUs, and then we avoid code duplication
by letting newer CPUs inherit code from (typically) older CPUs.

How close to optimal choices are we making now?  That's a question I
want answered.

Therefore the "tuneup" GMP test reporting category now contains tables
comparing all working assembly code for each CPU.

The "default" column has timing for the currently configured code for
each function.  The other columns show timing for the file in the
header.

Sample result file:
  https://gmplib.org/devel/tm/gmp/tuneup/success/brazen.gmplib.org-stat:64.txt

See for example mpn_addlsh1_n for brazen (an AMD Zen).  The default code
starts at around 6.8, 4.4, 3.4 and ends at 1.64 cycles/limb.  But the
code written for Intel Atom runs at 8.8, 4.9, 3.2 to end at 1.4
cycles/limb.  This code is faster except for 1 and 2 limb operands, and
should therefore be used.

-- 
Torbjörn
Please encrypt, key id 0xC8601622