T3/T3 mul_2 and addmul_2

Fri Mar 8 04:16:25 CET 2013

I only now spotted FPMADDXHI and FPMADDX.  No Sun/Oracle SPARC hae been
a floating-point demon, and these intger multiply instructions are
performed in the fpu.

Multiply-accumulate instructions are tricky, since one may easily put
the accumulation on a carry recurrency path, and thereby kill
performance.

One might want to try a addmul_1 using these instructions.  To avoid the
carry recurrency problem, the accumulation operand should be the limbs
from rp[].

Further addition can then be done in the integer unit, assuming there is
a fast datapath from the fpu to the iu.  The instruction to use is
MOVdTOx (Oracle's capitalisaton).

Instruction mix:

ldd     rp[i], %f0
ldd     up[i], %f2
fpmaddx %f2, v0, %f0, %f4
fpmaddxhi %f2, v0, %f0, %f6
movdtox %f4, %g1
addxccc %g1, %g2, %g3
movdtox %f6, %g2
stx     %g3, rp[i]

I doubt this will beat an optimal integer unit based addmul_1, with the
mix:

ldx     rp[i], ...
ldx     up[i], ...
mulx
umulxhi
addxccc
addxccc
stx

Fpmaddx and fpmaddxhi will get more useful for addmul_2 or higher, since
then we could do more accumulation in the fpu, and thus less movdtox.
The situation becomes very itanic-like with two key differences:

1. Itanic i 6-issue
2. Itanic requires 4 instructions for add-with-carry (addxccc)

-- 
Torbjörn