[PATCH] Optimize 32-bit sparc T1 multiply routines.

Torbjorn Granlund tg at gmplib.org
Fri Jan 18 21:22:00 CET 2013


David Miller <davem at davemloft.net> writes:

  
  While waiting for the FSF to execute my assignment, I tweaked my
  existing 2-way unrolled mul_1 and addmul_1 loops.  Currently on T4 I'm
  at:
  
  	mul_1		3.8 cycles/limb
  
  	addmul_1	5.5 cycles/limb
  
Nice progress!  I still recommend 4-way unrolling for at least the most
critical functions.  :-)

-- 
Torbjörn


More information about the gmp-devel mailing list