[PATCH] Fix wrong code generation for AMD Fam 11h CPUs in 32-bit mode
tg at gmplib.org
Fri Mar 9 10:40:59 CET 2012
Mikael Pettersson <mikpe at it.uu.se> writes:
I've run t-toom33 in parallel gdb sessions on fam10h and fam11h. The reason
they diverge is that with -march=amdfam10h, gcc is instructed to assume the
presence of the ABM extensions, so gcc emits an LZCNT in __gmp_urandomm_ui.
However, fam11h doesn't have ABM, and interprets the LZCNT encoding as BSR,
resulting in no SIGILL but entirely different results being computed. Shortly
after the first LZCNT is executed a JBE instruction takes different paths on
fam10h and fam11h, and a while later the code SIGSEGVs on fam11h with an
out-of-bounds memory access.
Thanks for this analysis!
That no-ABM CPUs interpret LZCNT as BSR is documented in AMD's programmer's
manual, and is not unexpected given how prefixes work in the x86 ISA.
(I won't bore you with the details, but this is a general problem with the
x86 ISA, and similar issues exist on Intel.)
At least PC processors are fast, we cannot expect them to properly
decode their horribly encoded instructions, can we? :-}
Passing -march=amdfam10 -mno-abm to gcc builds a working gmp, but that only
confirms that the ABM extensions are the problem here. The real bug is still
that -march=amdfam10 is wrong for fam11h.
More information about the gmp-bugs