Small operands gcd improvements

Torbjörn Granlund tg at
Wed Aug 7 23:08:04 UTC 2019

I'm having problems with timing of the gcd_11 code.  Unfortunately, the
nested macros of speed.h make things hard to read.  Could yo
double-check that operands to gcd_11 are odd and full limbs?

The odd thing is that gcd_1 seems to outperform gcd_11 in some 1 x 1
cases.  That could happen I suppose through gcd_1's initial reduction
(which look different in different .asm files.).  Or it could happen if
operands are not odd or if they have different bit counts.

  ... similar for testing gcd_22.

Speaking of gcd_22.  We need to determine this function's interface.

I suppose it will contain 2 or 3 loops, depending on arch.

The first loop will be 22.  If the GCD is two limbs, it will finish the
jobs.  Else it will invoke either of the following loops.

A possible middle loop will be 21.

The last loop will be 11.  We can simply inline a copy here as it is
tiny.  (A tail call won't work as the functions will have different
return types.)

Please encrypt, key id 0xC8601622

More information about the gmp-devel mailing list