[PATCH] Fix wrong code generation for AMD Fam 11h CPUs in 32-bit mode

Torbjorn Granlund tg at gmplib.org
Fri Mar 9 10:40:59 CET 2012


Mikael Pettersson <mikpe at it.uu.se> writes:

  I've run t-toom33 in parallel gdb sessions on fam10h and fam11h.  The reason
  they diverge is that with -march=amdfam10h, gcc is instructed to assume the
  presence of the ABM extensions, so gcc emits an LZCNT in __gmp_urandomm_ui.
  However, fam11h doesn't have ABM, and interprets the LZCNT encoding as BSR,
  resulting in no SIGILL but entirely different results being computed.  Shortly
  after the first LZCNT is executed a JBE instruction takes different paths on
  fam10h and fam11h, and a while later the code SIGSEGVs on fam11h with an
  out-of-bounds memory access.
  
Thanks for this analysis!

  That no-ABM CPUs interpret LZCNT as BSR is documented in AMD's programmer's
  manual, and is not unexpected given how prefixes work in the x86 ISA.
  (I won't bore you with the details, but this is a general problem with the
  x86 ISA, and similar issues exist on Intel.)
  
At least PC processors are fast, we cannot expect them to properly
decode their horribly encoded instructions, can we?  :-}

  Passing -march=amdfam10 -mno-abm to gcc builds a working gmp, but that only
  confirms that the ABM extensions are the problem here.  The real bug is still
  that -march=amdfam10 is wrong for fam11h.
  
Certainly so.

-- 
Torbjörn


More information about the gmp-bugs mailing list