[PATCH] Fix wrong code generation for AMD Fam 11h CPUs in 32-bit mode
mikpe at it.uu.se
Fri Mar 9 10:34:56 CET 2012
Torbjorn Granlund writes:
> Mikael Pettersson <mikpe at it.uu.se> writes:
> I suspect Fam 11h is somewhat common. According to the Fam 11h revision
> guide there are Athlons, Semprons, and Turions based on it, in both single
> and dual-core configurarions (Turions only as dual-core). I see no signs
> of any Opterons being based on it though.
> This is bad news, perhaps we ought to make a GMP 5.0.5.
> I pushed a fix to the 5.0 repo as well as the main repo.
> > I am quite surprised by the nature of the 'make check' failures; if the
> > compiler is directed by GMP to use unimplemented instructions, I'd
> > really expect "Illegal instruction" traps, not miscomputations,
> > segfaults, and hangs!
> > And in 64-bit mode all works fine. Even more strange.
> I would be interested in an analysis of what causes this behaviour. If
> you do a build passing CFLAGS="m32 -g -march=amdfam10", do the crashes
> still happen?
> In that case, perhaps you could debug this, perhaps in parallel on a k10
> and a 11h box, and see where and why they deviate?
I've run t-toom33 in parallel gdb sessions on fam10h and fam11h. The reason
they diverge is that with -march=amdfam10h, gcc is instructed to assume the
presence of the ABM extensions, so gcc emits an LZCNT in __gmp_urandomm_ui.
However, fam11h doesn't have ABM, and interprets the LZCNT encoding as BSR,
resulting in no SIGILL but entirely different results being computed. Shortly
after the first LZCNT is executed a JBE instruction takes different paths on
fam10h and fam11h, and a while later the code SIGSEGVs on fam11h with an
out-of-bounds memory access.
That no-ABM CPUs interpret LZCNT as BSR is documented in AMD's programmer's
manual, and is not unexpected given how prefixes work in the x86 ISA.
(I won't bore you with the details, but this is a general problem with the
x86 ISA, and similar issues exist on Intel.)
Passing -march=amdfam10 -mno-abm to gcc builds a working gmp, but that only
confirms that the ABM extensions are the problem here. The real bug is still
that -march=amdfam10 is wrong for fam11h.
More information about the gmp-bugs