Small operands gcd improvements
Torbjörn Granlund
tg at gmplib.org
Wed Aug 7 21:58:58 UTC 2019
nisse at lysator.liu.se (Niels Möller) writes:
> I might convert the former, but I am tempted to simply delete the k6
> gcd_1.asm.
The k6 code uses a branch on the u - v sign? Might be slower than the
current C code. What hardware is it used for?
https://en.wikipedia.org/wiki/AMD_K6
https://en.wikipedia.org/wiki/AMD_K6-2
https://en.wikipedia.org/wiki/AMD_K6-III
I.e., not the latest.
I also see that there's no x86/gcd_1.asm.
We might want to add a _11 there, perhaps using the C algorithm. (That
would be used also for k6, if anybody cares.)
Actually, the trick if keeping the lsb implicit might not be needed for
x86 (or other ISAs with a carry flag). A subtract-with-carry x,x,x
yields the needed mask.
The default x86 code cannot rely on bsf/bsr as these are awfully slow on
many older systems. We need the table trick used in x86/k7/gcd_11.asm,
and other places.
Below a gcd_11 unit test. Not terribly interesting, but we'll need
something similar for testing gcd_22. And we get a place to add tests
for any problematic corner cases.
Nice! Assuming the test code has been tested, please push!
--
Torbjörn
Please encrypt, key id 0xC8601622
More information about the gmp-devel
mailing list