[PATCH] Optimize 32-bit sparc T1 multiply routines.
David Miller
davem at davemloft.net
Fri Jan 4 21:51:11 CET 2013
From: Torbjorn Granlund <tg at gmplib.org>
Date: Fri, 04 Jan 2013 15:17:11 +0100
> I expect them to add 3n/2 to 3n cycles, depending on the pipeline
> characteristics.
Each load can issue in 1 cycle, there is a 4 cycle latency, the
loads will fully pipeline. Therefore the overhead is around 3n.
> The Oracle manuals recommend that one loops around mpmul checking for
> intermittent hardware failure, whatever that means.
Parity errors on the register files, nothing that actually happens
in practice.
> Our current Karatsuba code (evaluating in 0, -1, oo) will suffer from
> the forgotten subtraction instructions. Evaluating in 0, +1, oo might
> be better...
As I said in my other reply it depends upon how you need this
subtract.
If you need it for submul_1 or similar, that's not a problem. That
only needs subcc+addxc which we have.
If it's needed for sub_n, then yes that's a bit difficult. I was
trying to figure out ways to fabricate the needed calculations
using just subcc and addxc/addxcc but haven't come up with anything
just yet.
More information about the gmp-devel
mailing list