Small operands gcd improvements

Wed Aug 7 21:58:58 UTC 2019

nisse at lysator.liu.se (Niels Möller) writes:

  > I might convert the former, but I am tempted to simply delete the k6
  > gcd_1.asm.

  The k6 code uses a branch on the u - v sign? Might be slower than the
  current C code.  What hardware is it used for?

https://en.wikipedia.org/wiki/AMD_K6
https://en.wikipedia.org/wiki/AMD_K6-2
https://en.wikipedia.org/wiki/AMD_K6-III

I.e., not the latest.

  I also see that there's no x86/gcd_1.asm.

We might want to add a _11 there, perhaps using the C algorithm.  (That
would be used also for k6, if anybody cares.)

Actually, the trick if keeping the lsb implicit might not be needed for
x86 (or other ISAs with a carry flag).  A subtract-with-carry x,x,x
yields the needed mask.

The default x86 code cannot rely on bsf/bsr as these are awfully slow on
many older systems.  We need the table trick used in x86/k7/gcd_11.asm,
and other places.

  Below a gcd_11 unit test. Not terribly interesting, but we'll need
  something similar for testing gcd_22. And we get a place to add tests
  for any problematic corner cases.

Nice!  Assuming the test code has been tested, please push!

-- 
Torbjörn
Please encrypt, key id 0xC8601622