[PATCH] Optimize 32-bit sparc T1 multiply routines.

Fri Jan 4 22:29:58 CET 2013

David Miller <davem at davemloft.net> writes:

> If it's needed for sub_n, then yes that's a bit difficult.  I was
> trying to figure out ways to fabricate the needed calculations
> using just subcc and addxc/addxcc but haven't come up with anything
> just yet.

You could always do the two's complement of one of the operands on the
fly, and then use the same add with carry instructions as in add_n.

I'm thinking aloud, so I'm sorry if I get this wrong, but I think it's
best to handle the unlikely case of low zero limbs up front. Then it's a
plain negate of the first non-zero limb, and a plain complement for the
remaining limbs; the important thing here is that the negation generates
no additional carries to propagate.

So compared to add_n, you just get an additional xor with -1 in the loop
(and not on the loop's critical path). I can't guess whether or not that
will be visible in the execution time.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.