68000 issue in longlong.h

Niels Möller nisse at lysator.liu.se
Fri Mar 5 16:24:29 UTC 2021


"selco at t-online.de" <selco at t-online.de> writes:

> There you see the original code on the left and the generated assembly in the middle.
> You can add the & to
> "=d" (__umul_tmp1)
> in the left window and you see immediately the change in the comiled output.

Ok, let me see if I understand the problem. The inline asm for umul_ppmm
is as follows:

#define umul_ppmm(xh, xl, a, b) \
  do { USItype __umul_tmp1, __umul_tmp2;                                \
        __asm__ ("| Inlined umul_ppmm\n"                                \
"       move%.l %5,%3\n"                                                \
"       move%.l %2,%0\n"                                                \
"       move%.w %3,%1\n"                                                \
"       swap    %3\n"                                                   \
"       swap    %0\n"                                                   \
"       mulu%.w %2,%1\n"                                                \
"       mulu%.w %3,%0\n"                                                \
"       mulu%.w %2,%3\n"                                                \
"       swap    %2\n"                                                   \
"       mulu%.w %5,%2\n"                                                \
"       add%.l  %3,%2\n"                                                \
"       jcc     1f\n"                                                   \
"       add%.l  %#0x10000,%0\n"                                         \
"1:     move%.l %2,%3\n"                                                \
"       clr%.w  %2\n"                                                   \
"       swap    %2\n"                                                   \
"       swap    %3\n"                                                   \
"       clr%.w  %3\n"                                                   \
"       add%.l  %3,%1\n"                                                \
"       addx%.l %2,%0\n"                                                \
"       | End inlined umul_ppmm"                                        \
              : "=&d" (xh), "=&d" (xl),                                 \
                "=d" (__umul_tmp1), "=&d" (__umul_tmp2)                 \
              : "%2" ((USItype)(a)), "d" ((USItype)(b)));               \
  } while (0)

There are two instructions mentioning the input %5,

  move%.l %5,%3   (first instruction)

  mulu%.w %5,%2   (close to the middle)

The %2 register is both an input and output, and referenced in a few
places in between. My m68k assembly knowledge is very rusty, most of
them may be harmless reads, but the "swap %2" instruction just before
the use of %5 looks like it relies on %2 and %5 being distinct
registers? In the linked-to compiler output (listed as gcc-6.5.0b, using
-O1), the template is instantiated as

        | Inlined umul_ppmm
        move.l  d5,d2
        move.l  d5,d0
        move.w  d2,d1
        swap    d2
        swap    d0
        mulu.w  d5,d1
        mulu.w  d2,d0
        mulu.w  d5,d2
        swap    d5
        mulu.w  d5,d5
        add.l   d2,d5
        jcc     1f
        add.l   #0x10000,d0
        clr.w   d5
        swap    d5
        swap    d2
        clr.w   d2
        add.l   d2,d1
        addx.l  d5,d0
        | End inlined umul_ppmm

so it's quite clear that both %2 and %5 are mapped to the same register,
d5. Adding the suggested &, marking %2 as an "early clobber" output,
changes the instantiation to

        | Inlined umul_ppmm
        move.l  d5,d2
        move.l  d6,d0
        move.w  d2,d1
        swap    d2
        swap    d0
        mulu.w  d6,d1
        mulu.w  d2,d0
        mulu.w  d6,d2
        swap    d6
        mulu.w  d5,d6
        add.l   d2,d6
        jcc     1f
        add.l   #0x10000,d0
        clr.w   d6
        swap    d6
        swap    d2
        clr.w   d2
        add.l   d2,d1
        addx.l  d6,d0
        | End inlined umul_ppmm

I.e., %2 gets its own register, d6. If my analysis is right, the
critical difference is

        swap    d5
        mulu.w  d5,d5

vs

        swap    d6
        mulu.w  d5,d6

I can't say for sure if using the same register for %2 and %5 is
expected behavior of gcc. But to me it seems reasonable of gcc to try to
share a register when an inline asm template is instantiated with
identical expressions for two of the inputs (as in the squaring
umul_ppmm(xh, xl, u, u). And the additional & seems to be the documented
way to tell it to not do that for this asm template.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.


More information about the gmp-bugs mailing list