gcd_11 without abs

Niels Möller nisse at lysator.liu.se
Thu Nov 19 15:20:17 UTC 2020


I had one idea which I wonder if it's useful on some architecture. I
think all current variants of the binary gcd used for gcd_11 try to
replace (a,b) with min(a,b), abs(a-b), and then take out powers of two.

One could get rid of the absolute value by letting one of the working
values be negative. Something like this, in pseudo code

   b = - b

   while ( (sum = a + b) != 0)
       if (sum > 0) a = sum;
       else b = sum;

Note that we always have a > 0 and b < 0, so one doesn't need to
interpret high bit as a sign bit. I had a look at the current code for
arm (v6t2) and x86_64 (core2), and there I don't see any clear
improvement. For ARM, it would be something like

	neg	v0
	rbit	t, s
	clz	t, t
	lsr	s, s, t
	movcs	u0, s
	movcc	v0, s
	adds	s, u0, v0
	bne	L(top)

I don't expect that to be faster than the current code (but I haven't
tried it out). The current code does the absolute value neatly using
rsbcc r3, r3, #0.

For x86_64:

	neg	v0
	cmovc	s, v0
	cmovnc	s, v0
	lea	(u0, v0), s
	bsf	s, %rcx
	shr	R8(%rcx), s
	mov	u0, tmp
	add	v0, tmp		C For carry only
	jnz	top

That's one instruction less than the current code, but looks quite
clumsy. It's annoying that bsf clobbers the carry flag. Since we can't
use carry from the first addition anyway, we can use lea for that and
save a mov. But then we still need a mov + add to get the carry later
on; here negation works against us, since that makes the compare
instruction useless. 

Maybe there's some other architecture where negation is a clear win?


Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.

More information about the gmp-devel mailing list