Small operands gcd improvements
Torbjörn Granlund
tg at gmplib.org
Wed Aug 7 23:08:04 UTC 2019
I'm having problems with timing of the gcd_11 code. Unfortunately, the
nested macros of speed.h make things hard to read. Could yo
double-check that operands to gcd_11 are odd and full limbs?
The odd thing is that gcd_1 seems to outperform gcd_11 in some 1 x 1
cases. That could happen I suppose through gcd_1's initial reduction
(which look different in different .asm files.). Or it could happen if
operands are not odd or if they have different bit counts.
... similar for testing gcd_22.
Speaking of gcd_22. We need to determine this function's interface.
I suppose it will contain 2 or 3 loops, depending on arch.
The first loop will be 22. If the GCD is two limbs, it will finish the
jobs. Else it will invoke either of the following loops.
A possible middle loop will be 21.
The last loop will be 11. We can simply inline a copy here as it is
tiny. (A tail call won't work as the functions will have different
return types.)
--
Torbjörn
Please encrypt, key id 0xC8601622
More information about the gmp-devel
mailing list