Runs generic code version on VIA processors

Sun Aug 8 20:09:30 CEST 2010

Torbjorn Granlund wrote:
> Agner Fog <agner at agner.org> writes:
>
>   GMP version 5.0.1, x86
>     The performance on VIA processors is poor because __gmpn_cpuvec_init
>   in fat.c chooses the generic version of all functions, while the MMX
>   and SSE2 versions are only activated on Intel and AMD processors.
>   Which fat.c?  Are you talking about mpn/x86/fat/fat.c or
> mpn/x86_64/fat/fat.c?
>   
both
>   The problem is that the CPU dispatching is based on vendor strings and
>   CPU family and model numbers rather than on CPUID feature bits. This
>   is fundamentally wrong for several reasons:
>     * The __gmpn_cpuvec_init function assumes that all processors with
>   family and model numbers bigger than the currently known processors
>   support at least the same instruction sets as the ones we have.
>
> I wasn't aware of this.  Please be more specific.
>   
Just look at the code. It checks family and model numbers, not feature 
bits. You cannot make any assumptions about which instruction set is 
supported by an unknown processor based on the family and model numbers.
>   * You are making it difficult for new CPU vendors to enter the market
>   when you put them at a disadvantage by giving them only the generic
>   code path.
>   Not really "generic", they will still get assembly loop support.
>   
__gmpn_cpuvec_init gives it the generic code, which uses only the 80386 
instruction set
> I actually doubt falling back to the 'features' bits for unrecognised
> processors will be better typically, than assume the processor is
> similar to the last recognised processor in the same family.
>   
Here's an example of why this principle is wrong:  The latest version of 
Mathcad uses a 6 years old version of Intel Math Kernel Library (not 
GMP).  A Core 2 processor (family = 6) is treated here as "the last 
recognised processor in the same family", which is a Pentium III, with 
SSE instruction set. If you fake a Pentium 4 processor (family = 15) you 
get the SSE3 instruction set, and the speed is increased by 34%. Don't 
make the same mistake as Intel here. There is no logic in the family and 
model numbers.
>   * Systems that use virtualization, emulation or FPGA softcores are
>   gaining more use. You cannot make any assumptions about vendor strings
>   and family numbers on such systems.
>   We cannot?  But can we then make any assumptions about anything CPUID
> returns?  Which specific information from CPUID will be invalid for such
> systems?
>   
The feature bits for instruction sets will always be correct. The family 
and model numbers contain no useful information in this respect.
>   * You are making different code versions for different brands of
>   processors with the same instruction set. The performance advantage
>   you can gain by this is minimal at best. The disadvantages are that
>   the fat binary becomes fatter and there are more versions to test and
>   maintain.
>
> This is absolutely false, and shows that you do not appreciate the sort
> of optimisation being done in GMP.
>
> The available instructions do not tell much about which instructions are
> good to use in GMP, unfortunately.
>     * The code needs to be updated every time there is a new processor on
>   the market. Obviously, you don't have the resources for that. Much of
>   the source code is from 2005 or earlier.
>   Yes, as new processors come out, we need to make sure they are
> recognised and that they existing code works well for them.  Fortunately
> for us, fundamentally new microarchitectures are rare.
>
> (GMP usually recognises new processors long before they enter the
> market, since the CPU manufacturers tell us about their plans and the
> assinged CPUID numbers.)
>
> OK, so much of GMP's sources are from 2005 and earlier.  What is your
> point?
>   
You are not using the later instruction sets. Most of it is MMX only.
>   * The time it takes from you make a change in the source code till the
>   updated code makes it way through the application software to the end
>   user is at least one year, and more commonly two or more years. The
>   specific processor you are optimizing for is likely to be obsolete at
>   the time your code is running on the end user's computer.
>
> I think you exaggerate somewhat here to prove your point...
>   
The above example is 6 years. I am not exaggerating.
>     I will therefore propose that the CPU dispatching system
>   (i.e. __gmpn_cpuvec_init) should test only the CPUID feature bits
>   (MMX, SSE2, SSE3, and so on) and not look at any vendor strings,
>   family, or model numbers.
>   If we did that, it would make GMP's performance drop by a large factor.
>
>   You are saying at
>   http://gmplib.org/list-archives/gmp-announce/2010-January/000024.html
>   that there are VIA specific optimizations in version 5.0.0. Can you
>   please tell me where they are? There doesn't seem to be support for it
>   in __gmpn_cpuvec_init?
>   We claim VIA *nano* optimisations.  A "find gmp-5.0.0 -name nano" 
> should
> help you find some of the relevant code.
>
> The fat support might be lagging somewhat.  You need to be more specific
> about your config if you want to a more specific response.
>   
I am debugging into GMP.dll in Mathematica. The DLL doesn't tell 
anything about its version or configuration. Perhaps you can add version 
information to the .dll by using a .def file.

The dispatching goes through __gmpn_cpuvec_init, and it takes the 
generic version for VIA processors, even though the processor supports 
SSE4.1. The performance is improved if I fake a different vendor string. 
You should dispatch by instruction set for unknown processors.

You should distinguish by vendor string only when there is a significant 
performance advantage. There may be a difference between Intel and AMD 
in the case of a partial flags stall (that is the situation when you 
rely on the carry flag being unchanged by INC and DEC instructions). 
Otherwise, there will be very little advantage in distinguishing between 
CPU brands.