[PATCH] Optimize 32-bit sparc T1 multiply routines.

Niels Möller nisse at lysator.liu.se
Sat Jan 5 08:36:07 CET 2013

David Miller <davem at davemloft.net> writes:

>> So compared to add_n, you just get an additional xor with -1 in the loop
>> (and not on the loop's critical path). I can't guess whether or not that
>> will be visible in the execution time.
> Thanks I'll give this a try!

And on second thougt, there's no need to handle low zero limbs
specially, just set the carry flag before entering the loop. Using

u - v = u + (B^n - 1 - v) + 1 - B^n
            Complement    cin   adjust cout


Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.

More information about the gmp-devel mailing list