T3/T3 mul_2 and addmul_2
Torbjorn Granlund
tg at gmplib.org
Fri Mar 8 04:16:25 CET 2013
I only now spotted FPMADDXHI and FPMADDX. No Sun/Oracle SPARC hae been
a floating-point demon, and these intger multiply instructions are
performed in the fpu.
Multiply-accumulate instructions are tricky, since one may easily put
the accumulation on a carry recurrency path, and thereby kill
performance.
One might want to try a addmul_1 using these instructions. To avoid the
carry recurrency problem, the accumulation operand should be the limbs
from rp[].
Further addition can then be done in the integer unit, assuming there is
a fast datapath from the fpu to the iu. The instruction to use is
MOVdTOx (Oracle's capitalisaton).
Instruction mix:
ldd rp[i], %f0
ldd up[i], %f2
fpmaddx %f2, v0, %f0, %f4
fpmaddxhi %f2, v0, %f0, %f6
movdtox %f4, %g1
addxccc %g1, %g2, %g3
movdtox %f6, %g2
stx %g3, rp[i]
I doubt this will beat an optimal integer unit based addmul_1, with the
mix:
ldx rp[i], ...
ldx up[i], ...
mulx
umulxhi
addxccc
addxccc
stx
Fpmaddx and fpmaddxhi will get more useful for addmul_2 or higher, since
then we could do more accumulation in the fpu, and thus less movdtox.
The situation becomes very itanic-like with two key differences:
1. Itanic i 6-issue
2. Itanic requires 4 instructions for add-with-carry (addxccc)
--
Torbjörn
More information about the gmp-devel
mailing list