Is asm code selection optimal?
tg at gmplib.org
Tue May 16 13:53:29 UTC 2017
For a lot of CPUs, GMP selects a set of assembly files, these are then
used for important GMP inner loops. Often, the implementation of a
function runs well on several CPUs, and then we avoid code duplication
by letting newer CPUs inherit code from (typically) older CPUs.
How close to optimal choices are we making now? That's a question I
Therefore the "tuneup" GMP test reporting category now contains tables
comparing all working assembly code for each CPU.
The "default" column has timing for the currently configured code for
each function. The other columns show timing for the file in the
Sample result file:
See for example mpn_addlsh1_n for brazen (an AMD Zen). The default code
starts at around 6.8, 4.4, 3.4 and ends at 1.64 cycles/limb. But the
code written for Intel Atom runs at 8.8, 4.9, 3.2 to end at 1.4
cycles/limb. This code is faster except for 1 and 2 limb operands, and
should therefore be used.
Please encrypt, key id 0xC8601622
More information about the gmp-devel