Replacing gcd_1 with gcd_11

Torbjörn Granlund tg at
Sun Aug 18 11:12:48 UTC 2019

"Marco Bodrato" <bodrato at> writes:

  As Torbjörn showed with some timings, replacing gcd_1 with gcd_11 can be a
  performance regression when the size of the operands is really different
  (gcd_1 uses a division before starting the loop). Was this healed?

Since small operands performance is very important (and largely
neglected in GMP) we should add at least one more entry point in gcd_11
for even and different size operands.

  The current code uses mpn_gcd_1 (up, 2, vp[1]) for the 2:1 case. That
  function perform an initial modular reduction.

  Is mpn_gcd_22 (up[1], up[0], 0, v1) as fast as that code?

No, not currently, not for most CPUs.

The code really was ageing, with hardwired cutoff points.  Now, AMD
processors have very quich 128b/64b division (~45 cycles) so might be
usable.  Intel's such insn is useless.

Please encrypt, key id 0xC8601622

More information about the gmp-devel mailing list