To be clear, the meaning of this &, according to the docs, is to tell
gcc that the "output" register assigned to __umul_tmp1 can't overlap the
inputs. If I read it correctly, __umul_tmp1 is %2 in the asm template,
and the b input is %5. I've forgotten most I knew about 68k assembly,
but it looks to me like %5 is used twice, and %2 is used in between,
which could be a problem if they're assigned the same register. But not
sure how that would interact with "%2" ((USItype)(a)), which if I get it
right forces this input to be allocated in the same register as
__umul_tmp1 output.

The sqr_basecase function uses a couple of umul_ppmm(rp[11], lpl, ul, ul),
so my best guess is that we get all *three* of a, b, __umul_tmp1 allocated
in the same register.

If you could show the generated code (after gcc's register allocation)
*and* point out precisely where things go wrong, that would be helpful.


