[PATCH] Optimize 32-bit sparc T1 multiply routines.

Fri Jan 4 21:51:11 CET 2013

From: Torbjorn Granlund <tg at gmplib.org>
Date: Fri, 04 Jan 2013 15:17:11 +0100

> I expect them to add 3n/2 to 3n cycles, depending on the pipeline
> characteristics.

Each load can issue in 1 cycle, there is a 4 cycle latency, the
loads will fully pipeline.  Therefore the overhead is around 3n.

> The Oracle manuals recommend that one loops around mpmul checking for
> intermittent hardware failure, whatever that means.

Parity errors on the register files, nothing that actually happens
in practice.

> Our current Karatsuba code (evaluating in 0, -1, oo) will suffer from
> the forgotten subtraction instructions.  Evaluating in 0, +1, oo might
> be better...

As I said in my other reply it depends upon how you need this
subtract.

If you need it for submul_1 or similar, that's not a problem.  That
only needs subcc+addxc which we have.

If it's needed for sub_n, then yes that's a bit difficult.  I was
trying to figure out ways to fabricate the needed calculations
using just subcc and addxc/addxcc but haven't come up with anything
just yet.