CPU dispatching in GMP is flawed

Mon Aug 9 11:20:48 CEST 2010

As you can see from my replies at gmp-bugs, we do not agree about much
here:

1. We have different ideas on how GMP's current CPU selection works (I
wrote much of it).

2. I claim that GMP's current loop selection (as per e.g. mpn/x86_64/CPU)
is critically important for GMP's performance.  Agner claims that one
should select code for each CPU based not on its pipeline, but on its
available instructions, and that anything else matters little for
performance.

We agree that if Intel would release a Core i7 without SSE support, current
GMP would crash on that CPU (both as a result of compiler generated code
and as a result of our assembly code selection).  We disagree whether this
is a real problem.

Code selection in GMP is non-trivial and the current strategies can surely
be improved in various ways.  I am unfortunately convinced that Agner's
preferred strategy is a move in the wrong direction, since it would badly
hurt GMP's performance in most cases.

In the last years, we have tried to keep the selection code ahead of CPU
manufacturers product releases.  The GMP 5.0 code for Intel's chips is
about 2 years ahead of product releases, as an example.  (The code for fat
binaries lags behind, and 32-bit configs for 64-bit processors is not well
maintained.)

-- 
Torbjörn