[PATCH] Optimize 32-bit sparc T1 multiply routines.
nisse at lysator.liu.se
Fri Jan 4 22:29:58 CET 2013
David Miller <davem at davemloft.net> writes:
> If it's needed for sub_n, then yes that's a bit difficult. I was
> trying to figure out ways to fabricate the needed calculations
> using just subcc and addxc/addxcc but haven't come up with anything
> just yet.
You could always do the two's complement of one of the operands on the
fly, and then use the same add with carry instructions as in add_n.
I'm thinking aloud, so I'm sorry if I get this wrong, but I think it's
best to handle the unlikely case of low zero limbs up front. Then it's a
plain negate of the first non-zero limb, and a plain complement for the
remaining limbs; the important thing here is that the negation generates
no additional carries to propagate.
So compared to add_n, you just get an additional xor with -1 in the loop
(and not on the loop's critical path). I can't guess whether or not that
will be visible in the execution time.
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel