Small operands gcd improvements

Torbjörn Granlund tg at
Tue Aug 13 23:21:16 UTC 2019

"Marco Bodrato" <bodrato at> writes:

I saw this change go in:

diff -r 118627eed635 -r bb86e66536d5 mpn/x86_64/coreihwl/gcd_11.asm
--- a/mpn/x86_64/coreihwl/gcd_11.asm	Tue Aug 13 22:20:06 2019 +0200
+++ b/mpn/x86_64/coreihwl/gcd_11.asm	Wed Aug 14 01:06:08 2019 +0200
@@ -79,10 +79,10 @@
 	ALIGN(16)		C
 L(top):	bsf	v0, %rcx	C
+	mov	u0, %r9		C
 	sub	%rax, u0	C u - v
 	cmovc	v0, u0		C u = |u - v|
 	cmovc	%r9, %rax	C v = min(u,v)
-	shrx(	%rcx, u0, %r9)	C
 	shrx(	%rcx, u0, u0)	C
 	mov	%rax, v0	C
 	sub	u0, v0		C v - u

What's the purpose of this change?

Did you time it on hwl, bwl, skl to make sure it's not slower than the
changed code?

The double shrx was not a mistake; it sped things up quite a bit.
(I use the same trick for zen and zen2.)

Please encrypt, key id 0xC8601622

More information about the gmp-devel mailing list