Improved gcd_1 code
Niels Möller
nisse at lysator.liu.se
Tue Mar 13 09:58:47 CET 2012
Torbjorn Granlund <tg at gmplib.org> writes:
> I pushed some new gcd_1 code for x86_64, helping AMD k10 and bulldozer,
> Intel conroe/penryn, nehalem/westmere, and sandybridge, and VIA nano.
What algorithm do you use?
> Before counting the # of trailing zero bits, we need to wait for two
> cmove to complete. On Intel, cmove is a serial instruction with a 2
> cycle latency. It could be possible to compute ctz(a-b) instead of
> ctz(|a-b|) in order to run bsf an cmove in parallel.
The C code (for GCD_1_METHOD == 2) does something like that. But with
masking tricks instead of cmov, of course.
Regards,
/Niels
--
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel
mailing list