[PATCH] Optimize 32-bit sparc T1 multiply routines.

Sun Jan 6 01:19:39 CET 2013

David Miller <davem at davemloft.net> writes:

  Each load can issue in 1 cycle, there is a 4 cycle latency, the
  loads will fully pipeline.  Therefore the overhead is around 3n.

At most one memop / cycle?

  > Our current Karatsuba code (evaluating in 0, -1, oo) will suffer from
  > the forgotten subtraction instructions.  Evaluating in 0, +1, oo might
  > be better...

  As I said in my other reply it depends upon how you need this
  subtract.

Karatsuba need mpn_sub_n when in the interpolation if we evaluated in
-1.

-- 
Torbjörn