Runs generic code version on VIA processors
Agner Fog
agner at agner.org
Tue Jul 27 13:12:32 CEST 2010
GMP version 5.0.1, x86
The performance on VIA processors is poor because __gmpn_cpuvec_init in
fat.c chooses the generic version of all functions, while the MMX and
SSE2 versions are only activated on Intel and AMD processors.
The problem is that the CPU dispatching is based on vendor strings and
CPU family and model numbers rather than on CPUID feature bits. This is
fundamentally wrong for several reasons:
* The __gmpn_cpuvec_init function assumes that all processors with
family and model numbers bigger than the currently known processors
support at least the same instruction sets as the ones we have. There is
no guarantee that this will hold true in the future where low-power
light-weight processors are becoming more popular. The only safe way to
tell if a CPU supports a particular instruction set (e.g. SSE2) is to
check the CPUID feature bits.
* You are making it difficult for new CPU vendors to enter the market
when you put them at a disadvantage by giving them only the generic code
path.
* Systems that use virtualization, emulation or FPGA softcores are
gaining more use. You cannot make any assumptions about vendor strings
and family numbers on such systems.
* You are making different code versions for different brands of
processors with the same instruction set. The performance advantage you
can gain by this is minimal at best. The disadvantages are that the fat
binary becomes fatter and there are more versions to test and maintain.
* The code needs to be updated every time there is a new processor on
the market. Obviously, you don't have the resources for that. Much of
the source code is from 2005 or earlier.
* The time it takes from you make a change in the source code till the
updated code makes it way through the application software to the end
user is at least one year, and more commonly two or more years. The
specific processor you are optimizing for is likely to be obsolete at
the time your code is running on the end user's computer.
I will therefore propose that the CPU dispatching system (i.e.
__gmpn_cpuvec_init) should test only the CPUID feature bits (MMX, SSE2,
SSE3, and so on) and not look at any vendor strings, family, or model
numbers.
You are saying at
http://gmplib.org/list-archives/gmp-announce/2010-January/000024.html
that there are VIA specific optimizations in version 5.0.0. Can you
please tell me where they are? There doesn't seem to be support for it
in __gmpn_cpuvec_init?
The background for this bug report needs explanation:
I am doing research on some improper behavior of Intel software that
cripples performance on non-Intel processors. See my blog for details:
http://www.agner.org/optimize/blog/read.php?i=49
I have made a software tool that can change the CPUID vendor string on
VIA processors (it is more difficult to do on Intel and AMD processors).
I found that Mathematica runs faster on a VIA processor when the vendor
string is changed to GenuineIntel or AuthenticAMD. This was due to two
function libraries used by Mathematica, namely Intel Math Kernel Library
(MKL) and GMP. I was surprised by this. It is difficult to blame Intel
for improper practices when Gnu people are doing the same.
The Mathematica package includes GMP.dll. I don't know how to tell which
version it is.
More information about the gmp-bugs
mailing list