CPU dispatching in GMP is flawed
agner at agner.org
Mon Aug 9 09:50:31 CEST 2010
While doing research on some problems in Intel's math libraries, I
discovered that GMP has a similar problem. It relates to CPU dispatching.
CPU dispatching is the process where you choose between different code
paths depending on which CPU the code is running on. There can be
different code paths for different instruction sets, such as SSE2, SSE3,
SSE4, etc. Many Intel function libraries have a CPU dispatcher that
chooses the generic path (the fallback path that uses only the old 80386
instruction set) for any CPU that doesn't have the "GenuineIntel" vendor
id, even if the CPU is compatible with a better code path. Any software
that uses these Intel libraries have crippled performance on computers
with AMD or VIA chips. There is a big legal battle over this. See my
blog for details: http://www.agner.org/optimize/blog/read.php?i=49
This is not the place to discuss whether Intel's practice is permissible
or not. This has been debated at length elsewhere (see the links from my
blog). But I believe that a non-commercial library like GMP should treat
all brands of CPUs equally.
The GMP makes CPU dispatching in __gmpn_cpuvec_init in fat.c
This CPU dispatcher cheks for CPU vendor strings, family numbers and
model numbers only. It doesn't check the feature bits that indicate
which instruction set is supported. Unfortunately, it recognizes only
Intel and AMD chips. My VIA chip gets the slow generic path, even though
it supports the SSE4.1 instruction set. I can speed up the performance
by manipulating the CPU to fake that it is an Intel or AMD.
Since I regard this as a bug, I submitted a report to gmp-bugs, but
Torbjorn Granlund disagrees with me and suggests that I discuss it here
I believe that CPU dispatching should be based on the capabilities of
the CPU, not on specific brands and model numbers. You should check the
brand name and model numbers only in the case where you can get a
significant gain in performance by making different code branches for
different brands of CPU with the same instruction set.
The problems with a CPU dispatcher based on specific models are:
1. There is no logical strategy for how to handle future CPU models.
Unknown CPU brands get the generic (= worst possible) code path.
For known brands, it assumes that future models have higher family
and model numbers and support the same or higher instruction sets.
Unfortunately, there is no logical sequence in CPU family and
model numbers, and some unknown models may be small low-power
processors with fewer capabilities, so this assumption is not safe.
2. New or unknown CPU vendors are not handled optimally. You are
making it difficult for new CPU vendors (e.g. VIA) to enter the
market when your software is unable to handle it properly
3. Systems that use virtualization, emulation or FPGA soft cores are
gaining more use. You cannot make any assumptions about vendor
strings and family numbers on such systems
4. You are making different code versions for different brands of
processors with the same instruction set. The performance
advantage you can gain by this is often minimal or nil
5. The system becomes big and unmanageable when the number of code
branches keeps growing. Maintaining and testing such a system is
so costly that you are being exhausted for manpower. It is
difficult to weed out branches for obsolete processors.
6. The code size may become so big that the performance is
compromised by cache problems
7. The code needs to be updated every time there is a new processor
on the market. A system of regular updating is costly and is
likely to lag behind the needs of the end users
8. You should optimize for the future, not the present. Consider the
time it takes to develop, test and publish the function library.
Add to this the time it takes before the application programmer
decides to get the new version of the library. Add to this the
time it takes for the application programmer to develop a new
version of the application software using the new version of the
library. Add to this the time it takes to market the software. Add
to this the time it takes before the average end user decides to
upgrade the software to the newest version. All in all, it will
typically take several years before the software containing your
CPU dispatching is running on the end user's computer. Any
specific CPU that you have optimized for is likely to be obsolete
at that time.
More information about the gmp-discuss