Runs generic code version on VIA processors

Sun Aug 8 15:43:33 CEST 2010

Agner Fog <agner at agner.org> writes:

  GMP version 5.0.1, x86

  The performance on VIA processors is poor because __gmpn_cpuvec_init
  in fat.c chooses the generic version of all functions, while the MMX
  and SSE2 versions are only activated on Intel and AMD processors.

Which fat.c?  Are you talking about mpn/x86/fat/fat.c or
mpn/x86_64/fat/fat.c?

  The problem is that the CPU dispatching is based on vendor strings and
  CPU family and model numbers rather than on CPUID feature bits. This
  is fundamentally wrong for several reasons:

  * The __gmpn_cpuvec_init function assumes that all processors with
  family and model numbers bigger than the currently known processors
  support at least the same instruction sets as the ones we have.

I wasn't aware of this.  Please be more specific.

  * You are making it difficult for new CPU vendors to enter the market
  when you put them at a disadvantage by giving them only the generic
  code path.

Not really "generic", they will still get assembly loop support.

I actually doubt falling back to the 'features' bits for unrecognised
processors will be better typically, than assume the processor is
similar to the last recognised processor in the same family.

  * Systems that use virtualization, emulation or FPGA softcores are
  gaining more use. You cannot make any assumptions about vendor strings
  and family numbers on such systems.

We cannot?  But can we then make any assumptions about anything CPUID
returns?  Which specific information from CPUID will be invalid for such
systems?

  * You are making different code versions for different brands of
  processors with the same instruction set. The performance advantage
  you can gain by this is minimal at best. The disadvantages are that
  the fat binary becomes fatter and there are more versions to test and
  maintain.

This is absolutely false, and shows that you do not appreciate the sort
of optimisation being done in GMP.

The available instructions do not tell much about which instructions are
good to use in GMP, unfortunately.

  * The code needs to be updated every time there is a new processor on
  the market. Obviously, you don't have the resources for that. Much of
  the source code is from 2005 or earlier.

Yes, as new processors come out, we need to make sure they are
recognised and that they existing code works well for them.  Fortunately
for us, fundamentally new microarchitectures are rare.

(GMP usually recognises new processors long before they enter the
market, since the CPU manufacturers tell us about their plans and the
assinged CPUID numbers.)

OK, so much of GMP's sources are from 2005 and earlier.  What is your
point?

  * The time it takes from you make a change in the source code till the
  updated code makes it way through the application software to the end
  user is at least one year, and more commonly two or more years. The
  specific processor you are optimizing for is likely to be obsolete at
  the time your code is running on the end user's computer.

I think you exaggerate somewhat here to prove your point...

  I will therefore propose that the CPU dispatching system
  (i.e. __gmpn_cpuvec_init) should test only the CPUID feature bits
  (MMX, SSE2, SSE3, and so on) and not look at any vendor strings,
  family, or model numbers.

If we did that, it would make GMP's performance drop by a large factor.

  You are saying at
  http://gmplib.org/list-archives/gmp-announce/2010-January/000024.html
  that there are VIA specific optimizations in version 5.0.0. Can you
  please tell me where they are? There doesn't seem to be support for it
  in __gmpn_cpuvec_init?

We claim VIA *nano* optimisations.  A "find gmp-5.0.0 -name nano" should
help you find some of the relevant code.

The fat support might be lagging somewhat.  You need to be more specific
about your config if you want to a more specific response.

-- 
Torbjörn