Inline assembly needs "cc" clobber to avoid miscompilation
Andrew Wock
ajwock at gmail.com
Fri Nov 22 16:46:12 UTC 2019
I was using GMP to test an experimental version of LLVM/Clang on a PowerPC
server when I found that the test tests/mpn/t-div failed. As a developer
on the project, my first instinct was to investigate this issue as a
miscompilation. The file that appeared to miscompile was
mpn/div_qr_1n_pi1.c- particularly the code on lines 248-249 where plain c
was used to calculate a carry bit stored in the variable ‘cy’ within the
macro ADDC_LIMB, which was adjacent to some inline assembly that used
powerpc instructions that also calculate a carry.
Upon investigating this issue at the assembly level, I found that LLVM
emitted this sequence of instructions:
(The values of registers don’t really matter, it would have been correct if
carry was applied correctly)
subfc x, x, y // subtract and set carry flag, maps as part of line
div_qr_1n_pi1.c:248
addc a, b, c // add and set carry flag, inline assembly, clobbers
carry from previous
addze d, e, f // add and add carry flag, inline assembly
subfe w, w, z // add one’s complement and add carry flag, maps as
part of line div_qr_1n_pi1.c:249
The subfe was supposed to use the carry flag from subfc, but LLVM’s
scheduler failed to detect that the inline assembly could clobber the carry
flag. However, this does not appear to be an issue with how LLVM models
dependencies or clobbers with inline assembly- it’s that the inline
assembly doesn’t tell LLVM that it clobbers the carry flag in the first
place.
I modified add_ssaaaa in longlong.h to notify the compiler of this issue
and the test no longer failed. Below are the details of the code in
question as well as my incomplete solution to the problem.
For reference:
mpn/div_qr_1n_pi1.c: 248-249
242 add_ssaaaa (q2, q1, -u2, u2 & dinv, CNST_LIMB(0), u1);
243 add_ssaaaa (q2, q1, q2, q1, CNST_LIMB(0), p1);
244 add_ssaaaa (q2, q1, q2, q1, CNST_LIMB(0), q0);
245 q0 = t;
246
247 umul_ppmm (p1, p0, u1, B2);
248 ADDC_LIMB (cy, u0, u0, u2 & B2);
249 u0 -= (-cy) & d;
ADDC_LIMB:
gmp-impl.h: 2599 where cout is the calculated carry.
#define ADDC_LIMB(cout, w, x, y) \
2600 do { \
2601 mp_limb_t __x = (x); \
2602 mp_limb_t __y = (y); \
2603 mp_limb_t __w = __x + __y; \
2604 (w) = __w; \
2605 (cout) = __w < __x; \
2606 } while (0)
2607 #else
add_ssaaaa original:
longlong.h: 1329, got the assembly from the first if-statement when
preprocessing div_qr_1n_pi1.c
1329 #define add_ssaaaa(sh, sl, ah, al, bh, bl) \
1330 do { \
1331 if (__builtin_constant_p (bh) && (bh) == 0) \
1332 __asm__ ("add%I4c %1,%3,%4\n\taddze %0,%2" \
1333 : "=r" (sh), "=&r" (sl) : "r" (ah), "%r" (al), "rI" (bl) ); \
1334 else if (__builtin_constant_p (bh) && (bh) == ~(USItype) 0) \
1335 __asm__ ("add%I4c %1,%3,%4\n\taddme %0,%2" \
1336 : "=r" (sh), "=&r" (sl) : "r" (ah), "%r" (al), "rI" (bl) ); \
1337 else \
1338 __asm__ ("add%I5c %1,%4,%5\n\tadde %0,%2,%3" \
1339 : "=r" (sh), "=&r" (sl) \
1340 : "r" (ah), "r" (bh), "%r" (al), "rI" (bl) ); \
1341 } while (0)
add_ssaaaa modified:
1329 #define add_ssaaaa(sh, sl, ah, al, bh, bl) \
1330 do { \
1331 if (__builtin_constant_p (bh) && (bh) == 0) \
1332 __asm__ ("add%I4c %1,%3,%4\n\taddze %0,%2" \
1333 : "=r" (sh), "=&r" (sl) : "r" (ah), "%r" (al), "rI" (bl) :
*"cc"*); \
1334 else if (__builtin_constant_p (bh) && (bh) == ~(USItype) 0) \
1335 __asm__ ("add%I4c %1,%3,%4\n\taddme %0,%2" \
1336 : "=r" (sh), "=&r" (sl) : "r" (ah), "%r" (al), "rI" (bl) :
*"cc"*); \
1337 else \
1338 __asm__ ("add%I5c %1,%4,%5\n\tadde %0,%2,%3" \
1339 : "=r" (sh), "=&r" (sl) \
1340 : "r" (ah), "r" (bh), "%r" (al), "rI" (bl) : *"cc"*); \
1341 } while (0)
Now, this issue isn’t necessarily local to PowerPC or LLVM/Clang. This
could happen with any compiler that is unfortunate enough to make the
decision to schedule a carry generating and carry using pair of
instructions around some carry-clobbering inline assembly on any platform.
I recommend that all inline assembly that can clobber the carry flag or any
other conditional flag have ‘: “cc”’ appended to their clobber list so that
compilers will necessarily model the code correctly.
If there is any reason you think that LLVM/Clang should be able to handle
this assembly correctly without the “cc” clobber, please tell me, as I want
to fix any issues with the compiler that would make it miscompile GMP.
LLVM does not currently attempt to parse inline assembly and analyze the
instructions within.
GMP version 6.1.2
Investigated on RHEL on PowerPC, but recreated on FreeBSD on x86-64.
uname -a output:
Linux -----.----.---.--- 4.14.0-115.el7a.ppc64le #1 SMP Tue Sep 25 12:28:39
EDT 2018 ppc64le ppc64le ppc64le GNU/Linux
config.guess:
power9-unknown-linux-gnu
configfsf.guess:
powerpc64le-unkown-linux-gnu
The bug was triggered with an experimental version of LLVM/Clang
It could also be triggered with release clang 8.0.0.
The commands used to cause the test to fail was
make CC=$CC CFLAGS=’-O1 -g3’ -j12
make CC=$CC CFLAGS=’-O1 -g3’ -j12 check
where CC references the clang being used and the commands are necessarily
run on PowerPC from the base gmp directory.
Thank you.
More information about the gmp-bugs
mailing list