Runs generic code version on VIA processors

Tue Jul 27 13:12:32 CEST 2010

GMP version 5.0.1, x86

The performance on VIA processors is poor because __gmpn_cpuvec_init in 
fat.c chooses the generic version of all functions, while the MMX and 
SSE2 versions are only activated on Intel and AMD processors.

The problem is that the CPU dispatching is based on vendor strings and 
CPU family and model numbers rather than on CPUID feature bits. This is 
fundamentally wrong for several reasons:

* The __gmpn_cpuvec_init function assumes that all processors with 
family and model numbers bigger than the currently known processors 
support at least the same instruction sets as the ones we have. There is 
no guarantee that this will hold true in the future where low-power 
light-weight processors are becoming more popular. The only safe way to 
tell if a CPU supports a particular instruction set (e.g. SSE2) is to 
check the CPUID feature bits.

* You are making it difficult for new CPU vendors to enter the market 
when you put them at a disadvantage by giving them only the generic code 
path.

* Systems that use virtualization, emulation or FPGA softcores are 
gaining more use. You cannot make any assumptions about vendor strings 
and family numbers on such systems.

* You are making different code versions for different brands of 
processors with the same instruction set. The performance advantage you 
can gain by this is minimal at best. The disadvantages are that the fat 
binary becomes fatter and there are more versions to test and maintain.

* The code needs to be updated every time there is a new processor on 
the market. Obviously, you don't have the resources for that. Much of 
the source code is from 2005 or earlier.

* The time it takes from you make a change in the source code till the 
updated code makes it way through the application software to the end 
user is at least one year, and more commonly two or more years. The 
specific processor you are optimizing for is likely to be obsolete at 
the time your code is running on the end user's computer.

I will therefore propose that the CPU dispatching system (i.e. 
__gmpn_cpuvec_init) should test only the CPUID feature bits (MMX, SSE2, 
SSE3, and so on) and not look at any vendor strings, family, or model 
numbers.

You are saying at 
http://gmplib.org/list-archives/gmp-announce/2010-January/000024.html 
that there are VIA specific optimizations in version 5.0.0. Can you 
please tell me where they are? There doesn't seem to be support for it 
in __gmpn_cpuvec_init?

The background for this bug report needs explanation:

I am doing research on some improper behavior of Intel software that 
cripples performance on non-Intel processors. See my blog for details: 
http://www.agner.org/optimize/blog/read.php?i=49

I have made a software tool that can change the CPUID vendor string on 
VIA processors (it is more difficult to do on Intel and AMD processors). 
I found that Mathematica runs faster on a VIA processor when the vendor 
string is changed to GenuineIntel or AuthenticAMD. This was due to two 
function libraries used by Mathematica, namely Intel Math Kernel Library 
(MKL) and GMP. I was surprised by this. It is difficult to blame Intel 
for improper practices when Gnu people are doing the same.

The Mathematica package includes GMP.dll. I don't know how to tell which 
version it is.