Inline assembly needs "cc" clobber to avoid miscompilation

Andrew Wock ajwock at gmail.com
Fri Nov 22 16:46:12 UTC 2019


I was using GMP to test an experimental version of LLVM/Clang on a PowerPC
server when I found that the test tests/mpn/t-div failed.  As a developer
on the project, my first instinct was to investigate this issue as a
miscompilation.  The file that appeared to miscompile was
mpn/div_qr_1n_pi1.c- particularly the code on lines 248-249 where plain c
was used to calculate a carry bit stored in the variable ‘cy’ within the
macro ADDC_LIMB, which was adjacent to some inline assembly that used
powerpc instructions that also calculate a carry.



Upon investigating this issue at the assembly level, I found that LLVM
emitted this sequence of instructions:



(The values of registers don’t really matter, it would have been correct if
carry was applied correctly)



subfc x, x, y          // subtract and set carry flag, maps as part of line
div_qr_1n_pi1.c:248

addc  a, b, c         // add and set carry flag, inline assembly, clobbers
carry from previous

addze d, e, f         // add and add carry flag, inline assembly

subfe w, w, z        // add one’s complement and add carry flag, maps as
part of line div_qr_1n_pi1.c:249



The subfe was supposed to use the carry flag from subfc, but LLVM’s
scheduler failed to detect that the inline assembly could clobber the carry
flag.  However, this does not appear to be an issue with how LLVM models
dependencies or clobbers with inline assembly- it’s that the inline
assembly doesn’t tell LLVM that it clobbers the carry flag in the first
place.



I modified add_ssaaaa in longlong.h to notify the compiler of this issue
and the test no longer failed.  Below are the details of the code in
question as well as my incomplete solution to the problem.



For reference:



mpn/div_qr_1n_pi1.c: 248-249



242       add_ssaaaa (q2, q1, -u2, u2 & dinv, CNST_LIMB(0), u1);

243       add_ssaaaa (q2, q1, q2, q1, CNST_LIMB(0), p1);

244       add_ssaaaa (q2, q1, q2, q1, CNST_LIMB(0), q0);

245       q0 = t;

246

247       umul_ppmm (p1, p0, u1, B2);

248       ADDC_LIMB (cy, u0, u0, u2 & B2);

249       u0 -= (-cy) & d;



ADDC_LIMB:



gmp-impl.h: 2599 where cout is the calculated carry.



#define ADDC_LIMB(cout, w, x, y)          \

2600 do {                  \

2601     mp_limb_t  __x = (x);           \

2602     mp_limb_t  __y = (y);           \

2603     mp_limb_t  __w = __x + __y;           \

2604     (w) = __w;                \

2605     (cout) = __w < __x;             \

2606   } while (0)

2607 #else



add_ssaaaa original:



longlong.h: 1329, got the assembly from the first if-statement when
preprocessing div_qr_1n_pi1.c



1329 #define add_ssaaaa(sh, sl, ah, al, bh, bl) \

1330   do {                  \

1331     if (__builtin_constant_p (bh) && (bh) == 0)       \

1332       __asm__ ("add%I4c %1,%3,%4\n\taddze %0,%2"      \

1333        : "=r" (sh), "=&r" (sl) : "r" (ah), "%r" (al), "rI" (bl) );  \

1334     else if (__builtin_constant_p (bh) && (bh) == ~(USItype) 0)   \

1335       __asm__ ("add%I4c %1,%3,%4\n\taddme %0,%2"      \

1336        : "=r" (sh), "=&r" (sl) : "r" (ah), "%r" (al), "rI" (bl) );  \

1337     else                \

1338       __asm__ ("add%I5c %1,%4,%5\n\tadde %0,%2,%3"      \

1339        : "=r" (sh), "=&r" (sl)          \

1340        : "r" (ah), "r" (bh), "%r" (al), "rI" (bl) );    \

1341   } while (0)



add_ssaaaa modified:



1329 #define add_ssaaaa(sh, sl, ah, al, bh, bl) \

1330   do {                  \

1331     if (__builtin_constant_p (bh) && (bh) == 0)       \

1332       __asm__ ("add%I4c %1,%3,%4\n\taddze %0,%2"      \

1333        : "=r" (sh), "=&r" (sl) : "r" (ah), "%r" (al), "rI" (bl) :
*"cc"*);  \

1334     else if (__builtin_constant_p (bh) && (bh) == ~(USItype) 0)   \

1335       __asm__ ("add%I4c %1,%3,%4\n\taddme %0,%2"      \

1336        : "=r" (sh), "=&r" (sl) : "r" (ah), "%r" (al), "rI" (bl) :
*"cc"*);  \

1337     else                \

1338       __asm__ ("add%I5c %1,%4,%5\n\tadde %0,%2,%3"      \

1339        : "=r" (sh), "=&r" (sl)          \

1340        : "r" (ah), "r" (bh), "%r" (al), "rI" (bl) : *"cc"*);    \

1341   } while (0)



Now, this issue isn’t necessarily local to PowerPC or LLVM/Clang.  This
could happen with any compiler that is unfortunate enough to make the
decision to schedule a carry generating and carry using pair of
instructions around some carry-clobbering inline assembly on any platform.



I recommend that all inline assembly that can clobber the carry flag or any
other conditional flag have ‘: “cc”’ appended to their clobber list so that
compilers will necessarily model the code correctly.



If there is any reason you think that LLVM/Clang should be able to handle
this assembly correctly without the “cc” clobber, please tell me, as I want
to fix any issues with the compiler that would make it miscompile GMP.
LLVM does not currently attempt to parse inline assembly and analyze the
instructions within.



GMP version 6.1.2

Investigated on RHEL on PowerPC, but recreated on FreeBSD on x86-64.

uname -a output:

Linux -----.----.---.--- 4.14.0-115.el7a.ppc64le #1 SMP Tue Sep 25 12:28:39
EDT 2018 ppc64le ppc64le ppc64le GNU/Linux

config.guess:

power9-unknown-linux-gnu

configfsf.guess:

powerpc64le-unkown-linux-gnu

The bug was triggered with an experimental version of LLVM/Clang

It could also be triggered with release clang 8.0.0.

The commands used to cause the test to fail was



make CC=$CC CFLAGS=’-O1 -g3’ -j12

make CC=$CC CFLAGS=’-O1 -g3’ -j12 check



where CC references the clang being used and the commands are necessarily
run on PowerPC from the base gmp directory.


Thank you.


More information about the gmp-bugs mailing list