[PATCH] Optimize 32-bit sparc T1 multiply routines.
Torbjorn Granlund
tg at gmplib.org
Fri Jan 18 21:22:00 CET 2013
David Miller <davem at davemloft.net> writes:
While waiting for the FSF to execute my assignment, I tweaked my
existing 2-way unrolled mul_1 and addmul_1 loops. Currently on T4 I'm
at:
mul_1 3.8 cycles/limb
addmul_1 5.5 cycles/limb
Nice progress! I still recommend 4-way unrolling for at least the most
critical functions. :-)
--
Torbjörn
More information about the gmp-devel
mailing list