CPU dispatching in GMP is flawed

Mon Aug 9 09:50:31 CEST 2010

While doing research on some problems in Intel's math libraries, I 
discovered that GMP has a similar problem. It relates to CPU dispatching.

CPU dispatching is the process where you choose between different code 
paths depending on which CPU the code is running on. There can be 
different code paths for different instruction sets, such as SSE2, SSE3, 
SSE4, etc. Many Intel function libraries have a CPU dispatcher that 
chooses the generic path (the fallback path that uses only the old 80386 
instruction set) for any CPU that doesn't have the "GenuineIntel" vendor 
id, even if the CPU is compatible with a better code path. Any software 
that uses these Intel libraries have crippled performance on computers 
with AMD or VIA chips. There is a big legal battle over this. See my 
blog for details: http://www.agner.org/optimize/blog/read.php?i=49

This is not the place to discuss whether Intel's practice is permissible 
or not. This has been debated at length elsewhere (see the links from my 
blog). But I believe that a non-commercial library like GMP should treat 
all brands of CPUs equally.

The GMP makes CPU dispatching in __gmpn_cpuvec_init in fat.c
This CPU dispatcher cheks for CPU vendor strings, family numbers and 
model numbers only. It doesn't check the feature bits that indicate 
which instruction set is supported. Unfortunately, it recognizes only 
Intel and AMD chips. My VIA chip gets the slow generic path, even though 
it supports the SSE4.1 instruction set. I can speed up the performance 
by manipulating the CPU to fake that it is an Intel or AMD.

Since I regard this as a bug, I submitted a report to gmp-bugs, but 
Torbjorn Granlund disagrees with me and suggests that I discuss it here 
instead.

I believe that CPU dispatching should be based on the capabilities of 
the CPU, not on specific brands and model numbers. You should check the 
brand name and model numbers only in the case where you can get a 
significant gain in performance by making different code branches for 
different brands of CPU with the same instruction set.

The problems with a CPU dispatcher based on specific models are:

   1.  There is no logical strategy for how to handle future CPU models.
      Unknown CPU brands get the generic (= worst possible) code path.
      For known brands, it assumes that future models have higher family
      and model numbers and support the same or higher instruction sets.
      Unfortunately, there is no logical sequence in CPU family and
      model numbers, and some unknown models may be small low-power
      processors with fewer capabilities, so this assumption is not safe.

   2. New or unknown CPU vendors are not handled optimally. You are
      making it difficult for new CPU vendors (e.g. VIA) to enter the
      market when your software is unable to handle it properly

   3. Systems that use virtualization, emulation or FPGA soft cores are
      gaining more use. You cannot make any assumptions about vendor
      strings and family numbers on such systems

   4. You are making different code versions for different brands of
      processors with the same instruction set. The performance
      advantage you can gain by this is often minimal or nil

   5. The system becomes big and unmanageable when the number of code
      branches keeps growing. Maintaining and testing such a system is
      so costly that you are being exhausted for manpower. It is
      difficult to weed out branches for obsolete processors.

   6. The code size may become so big that the performance is
      compromised by cache problems

   7. The code needs to be updated every time there is a new processor
      on the market. A system of regular updating is costly and is
      likely to lag behind the needs of the end users

   8. You should optimize for the future, not the present. Consider the
      time it takes to develop, test and publish the function library.
      Add to this the time it takes before the application programmer
      decides to get the new version of the library. Add to this the
      time it takes for the application programmer to develop a new
      version of the application software using the new version of the
      library. Add to this the time it takes to market the software. Add
      to this the time it takes before the average end user decides to
      upgrade the software to the newest version. All in all, it will
      typically take several years before the software containing your
      CPU dispatching is running on the end user's computer. Any
      specific CPU that you have optimized for is likely to be obsolete
      at that time.