[PATCH 0/3] Resubmit of Sparc T3/T4 patches.
tg at gmplib.org
Wed Mar 6 00:08:09 CET 2013
The addmul code could be simularly improved.
But unlike mul_1, we cannot keep a recurrent carry alive, since we are
to add 3 limbs at each column.
Instead, one can add in two phases. See powerpc/mode64/aorsmul_1.asm
for an example.
With two-way unrolling, one would need these insns:
addxccc climb, ..., p0 C add climb and lowest prod limb
addxc %g0, ..., climb C propagate carry to highest prod limb, keep climb for next iteration
addcc rp[i], p0, p0
addxccc rp[i+1], p1, p1
[keep carry flag alive to next iteraration]
All-in-all, this will be 19 insns, down from 22.
(There are 4+7n instructions for n-way unrolling.)
More information about the gmp-devel