[PATCH] Optimize 32-bit sparc T1 multiply routines.

Torbjorn Granlund tg at gmplib.org
Sun Jan 6 01:19:39 CET 2013


David Miller <davem at davemloft.net> writes:

  Each load can issue in 1 cycle, there is a 4 cycle latency, the
  loads will fully pipeline.  Therefore the overhead is around 3n.
  
At most one memop / cycle?

  > Our current Karatsuba code (evaluating in 0, -1, oo) will suffer from
  > the forgotten subtraction instructions.  Evaluating in 0, +1, oo might
  > be better...
  
  As I said in my other reply it depends upon how you need this
  subtract.
  
Karatsuba need mpn_sub_n when in the interpolation if we evaluated in
-1.

-- 
Torbjörn


More information about the gmp-devel mailing list