[PATCH] Optimize 32-bit sparc T1 multiply routines.
Niels Möller
nisse at lysator.liu.se
Sat Jan 5 08:36:07 CET 2013
David Miller <davem at davemloft.net> writes:
>> So compared to add_n, you just get an additional xor with -1 in the loop
>> (and not on the loop's critical path). I can't guess whether or not that
>> will be visible in the execution time.
>
> Thanks I'll give this a try!
And on second thougt, there's no need to handle low zero limbs
specially, just set the carry flag before entering the loop. Using
u - v = u + (B^n - 1 - v) + 1 - B^n
Complement cin adjust cout
Regards,
/Niels
--
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel
mailing list