[PATCH 0/3] Resubmit of Sparc T3/T4 patches.

Torbjorn Granlund tg at gmplib.org
Wed Mar 6 00:08:09 CET 2013

The addmul code could be simularly improved.

But unlike mul_1, we cannot keep a recurrent carry alive, since we are
to add 3 limbs at each column.

Instead, one can add in two phases.  See powerpc/mode64/aorsmul_1.asm
for an example.

With two-way unrolling, one would need these insns:

mulx    up[i]
umulxhi up[i]
mulx    up[i+1]
umulxhi up[i+1]

addxccc	climb, ..., p0   C add climb and lowest prod limb
addxccc             p1
addxc   %g0, ..., climb  C propagate carry to highest prod limb, keep climb for next iteration

addcc   rp[i], p0, p0
addxccc	rp[i+1], p1, p1
[keep carry flag alive to next iteraration]

All-in-all, this will be 19 insns, down from 22.

(There are 4+7n instructions for n-way unrolling.)


More information about the gmp-devel mailing list