Small operands gcd improvements

Niels Möller nisse at lysator.liu.se
Tue Aug 6 17:31:24 UTC 2019


tg at gmplib.org (Torbjörn Granlund) writes:

> nisse at lysator.liu.se (Niels Möller) writes:
>
>   Maybe easier to wait until asm files are updated so that
>   HAVE_NATIVE_mpn_gcd_1 implies HAVE_NATIVE_mpn_gcd_11. Or was it some
>   particular call site you had in mind?
>
> I have seen calls to gcd_1 which could use gcd_11, presumably from
> mpn_gcd (or its descendant),

Right, mpn_gcd usually ends with a call to mpn_gcd_1 or gcd_2, and the
latter also usually ends with a call to gcd_1. But I think it's easiest
to leave that as is until we have good gcd_22.

> I expect asm gcd_1 to disappear as the C code should be equivalent.  Do
> you agree?

That would be nice. There will be one more function call, but hopefully
that's not going to be a significant performance regression.

> I suppose some hardwired stuff for the case u >> v (not bitshift, the
> mathematical meaning of >>!) might want to be parameterised and also
> ideally tune/tuneup'ed.

Should that be done by gcd_1 only? Or do we need some variant of gcd_11
with an initial division?

> (16 seems like a huge default value, btw.)

My workstation (intel broadwell) uses takes 96 cycles for a division (if
I read https://gmplib.org/~tege/x86-timing.pdf, or is there are faster
64/64 div isntruction?). And gcd_11 runs at roughly 4 cycles per input
bit according to speed. So then threshold should be around 24.

Below patch to add a gcd_11 entrypoint for this arch. Passes make check,
but would be good to also test with devel/try.

Regards,
/Niels

diff -Nprc2 gmp-gcd_11.cdf9e11a028b/mpn/asm-defs.m4 gmp-gcd_11/mpn/asm-defs.m4
*** gmp-gcd_11.cdf9e11a028b/mpn/asm-defs.m4	2019-08-06 17:16:07.000000000 +0200
--- gmp-gcd_11/mpn/asm-defs.m4	2019-08-06 19:25:58.225354000 +0200
*************** define_mpn(dump)
*** 1395,1398 ****
--- 1395,1399 ----
  define_mpn(gcd)
  define_mpn(gcd_1)
+ define_mpn(gcd_11)
  define_mpn(gcdext)
  define_mpn(get_str)
diff -Nprc2 gmp-gcd_11.cdf9e11a028b/mpn/x86_64/core2/gcd_1.asm gmp-gcd_11/mpn/x86_64/core2/gcd_1.asm
*** gmp-gcd_11.cdf9e11a028b/mpn/x86_64/core2/gcd_1.asm	2019-08-06 17:16:07.000000000 +0200
--- gmp-gcd_11/mpn/x86_64/core2/gcd_1.asm	2019-08-06 19:25:58.225354000 +0200
*************** L(top):	cmovc	%r10, %rax		C if x-y < 0
*** 137,141 ****
  	cmovc	%r9, v0			C use x,y-x    0,3  0,3  2,8  1,7  1,7
  L(mid):	shr	R8(%rcx), %rax		C              1,7  1,6  2,8  2,8  2,8
! 	mov	v0, %r10		C              1    1    4    3    3
  	sub	%rax, %r10		C              2    2    5    4    4
  	bsf	%r10, %rcx		C              3    3    6    5    5
--- 137,141 ----
  	cmovc	%r9, v0			C use x,y-x    0,3  0,3  2,8  1,7  1,7
  L(mid):	shr	R8(%rcx), %rax		C              1,7  1,6  2,8  2,8  2,8
! L(odd):	mov	v0, %r10		C              1    1    4    3    3
  	sub	%rax, %r10		C              2    2    5    4    4
  	bsf	%r10, %rcx		C              3    3    6    5    5
*************** L(end):	pop	%rcx			C common twos
*** 150,151 ****
--- 150,163 ----
  	ret
  EPILOGUE()
+ 
+ 	TEXT
+ 	ALIGN(16)
+ PROLOGUE(mpn_gcd_11)
+ 	FUNC_ENTRY(2)
+ 	xor	%ecx, %ecx
+ 	push	%rcx
+ 	mov	%rdi, %rax
+ 	mov	%rsi, v0
+ 	jmp	L(odd)
+ EPILOGUE()
+ 	

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.


More information about the gmp-devel mailing list