Small operands gcd improvements

Torbjörn Granlund tg at
Tue Aug 13 20:38:41 UTC 2019

I pushed a few more variants of gcd_11 with nice speed improvements for
several x86_64 CPUs.  I am sure much more can be done.

I have a bunch of finished gcd_22 too; these are generic without any
CPU-specific tweaks.  I haven't timed them, they are just tested for
correctness.  It might be desirable to modify some of the gcd_11 loops
to do two-limb arithmetic, and use these to create gcd_22 inner loops.

