[PATCH] Optimize 32-bit sparc T1 multiply routines.

Fri Jan 18 21:22:00 CET 2013

David Miller <davem at davemloft.net> writes:

  While waiting for the FSF to execute my assignment, I tweaked my
  existing 2-way unrolled mul_1 and addmul_1 loops.  Currently on T4 I'm
  at:

  	mul_1		3.8 cycles/limb

  	addmul_1	5.5 cycles/limb

Nice progress!  I still recommend 4-way unrolling for at least the most
critical functions.  :-)

-- 
Torbjörn