[PATCH] Optimize 32-bit sparc T1 multiply routines.

Fri Jan 4 22:34:25 CET 2013

From: nisse at lysator.liu.se (Niels Möller)
Date: Fri, 04 Jan 2013 22:29:58 +0100

> David Miller <davem at davemloft.net> writes:
> 
>> If it's needed for sub_n, then yes that's a bit difficult.  I was
>> trying to figure out ways to fabricate the needed calculations
>> using just subcc and addxc/addxcc but haven't come up with anything
>> just yet.
> 
> You could always do the two's complement of one of the operands on the
> fly, and then use the same add with carry instructions as in add_n.
> 
> I'm thinking aloud, so I'm sorry if I get this wrong, but I think it's
> best to handle the unlikely case of low zero limbs up front. Then it's a
> plain negate of the first non-zero limb, and a plain complement for the
> remaining limbs; the important thing here is that the negation generates
> no additional carries to propagate.
> 
> So compared to add_n, you just get an additional xor with -1 in the loop
> (and not on the loop's critical path). I can't guess whether or not that
> will be visible in the execution time.

Thanks I'll give this a try!