[PATCH] Optimize 32-bit sparc T1 multiply routines.
tg at gmplib.org
Sun Jan 6 01:19:39 CET 2013
David Miller <davem at davemloft.net> writes:
Each load can issue in 1 cycle, there is a 4 cycle latency, the
loads will fully pipeline. Therefore the overhead is around 3n.
At most one memop / cycle?
> Our current Karatsuba code (evaluating in 0, -1, oo) will suffer from
> the forgotten subtraction instructions. Evaluating in 0, +1, oo might
> be better...
As I said in my other reply it depends upon how you need this
Karatsuba need mpn_sub_n when in the interpolation if we evaluated in
More information about the gmp-devel