68000 issue in longlong.h
Niels Möller
nisse at lysator.liu.se
Fri Mar 5 16:24:29 UTC 2021
"selco at t-online.de" <selco at t-online.de> writes:
> There you see the original code on the left and the generated assembly in the middle.
> You can add the & to
> "=d" (__umul_tmp1)
> in the left window and you see immediately the change in the comiled output.
Ok, let me see if I understand the problem. The inline asm for umul_ppmm
is as follows:
#define umul_ppmm(xh, xl, a, b) \
do { USItype __umul_tmp1, __umul_tmp2; \
__asm__ ("| Inlined umul_ppmm\n" \
" move%.l %5,%3\n" \
" move%.l %2,%0\n" \
" move%.w %3,%1\n" \
" swap %3\n" \
" swap %0\n" \
" mulu%.w %2,%1\n" \
" mulu%.w %3,%0\n" \
" mulu%.w %2,%3\n" \
" swap %2\n" \
" mulu%.w %5,%2\n" \
" add%.l %3,%2\n" \
" jcc 1f\n" \
" add%.l %#0x10000,%0\n" \
"1: move%.l %2,%3\n" \
" clr%.w %2\n" \
" swap %2\n" \
" swap %3\n" \
" clr%.w %3\n" \
" add%.l %3,%1\n" \
" addx%.l %2,%0\n" \
" | End inlined umul_ppmm" \
: "=&d" (xh), "=&d" (xl), \
"=d" (__umul_tmp1), "=&d" (__umul_tmp2) \
: "%2" ((USItype)(a)), "d" ((USItype)(b))); \
} while (0)
There are two instructions mentioning the input %5,
move%.l %5,%3 (first instruction)
mulu%.w %5,%2 (close to the middle)
The %2 register is both an input and output, and referenced in a few
places in between. My m68k assembly knowledge is very rusty, most of
them may be harmless reads, but the "swap %2" instruction just before
the use of %5 looks like it relies on %2 and %5 being distinct
registers? In the linked-to compiler output (listed as gcc-6.5.0b, using
-O1), the template is instantiated as
| Inlined umul_ppmm
move.l d5,d2
move.l d5,d0
move.w d2,d1
swap d2
swap d0
mulu.w d5,d1
mulu.w d2,d0
mulu.w d5,d2
swap d5
mulu.w d5,d5
add.l d2,d5
jcc 1f
add.l #0x10000,d0
clr.w d5
swap d5
swap d2
clr.w d2
add.l d2,d1
addx.l d5,d0
| End inlined umul_ppmm
so it's quite clear that both %2 and %5 are mapped to the same register,
d5. Adding the suggested &, marking %2 as an "early clobber" output,
changes the instantiation to
| Inlined umul_ppmm
move.l d5,d2
move.l d6,d0
move.w d2,d1
swap d2
swap d0
mulu.w d6,d1
mulu.w d2,d0
mulu.w d6,d2
swap d6
mulu.w d5,d6
add.l d2,d6
jcc 1f
add.l #0x10000,d0
clr.w d6
swap d6
swap d2
clr.w d2
add.l d2,d1
addx.l d6,d0
| End inlined umul_ppmm
I.e., %2 gets its own register, d6. If my analysis is right, the
critical difference is
swap d5
mulu.w d5,d5
vs
swap d6
mulu.w d5,d6
I can't say for sure if using the same register for %2 and %5 is
expected behavior of gcc. But to me it seems reasonable of gcc to try to
share a register when an inline asm template is instantiated with
identical expressions for two of the inputs (as in the squaring
umul_ppmm(xh, xl, u, u). And the additional & seems to be the documented
way to tell it to not do that for this asm template.
Regards,
/Niels
--
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
More information about the gmp-bugs
mailing list