Small operands gcd improvements
Niels Möller
nisse at lysator.liu.se
Tue Aug 6 17:31:24 UTC 2019
tg at gmplib.org (Torbjörn Granlund) writes:
> nisse at lysator.liu.se (Niels Möller) writes:
>
> Maybe easier to wait until asm files are updated so that
> HAVE_NATIVE_mpn_gcd_1 implies HAVE_NATIVE_mpn_gcd_11. Or was it some
> particular call site you had in mind?
>
> I have seen calls to gcd_1 which could use gcd_11, presumably from
> mpn_gcd (or its descendant),
Right, mpn_gcd usually ends with a call to mpn_gcd_1 or gcd_2, and the
latter also usually ends with a call to gcd_1. But I think it's easiest
to leave that as is until we have good gcd_22.
> I expect asm gcd_1 to disappear as the C code should be equivalent. Do
> you agree?
That would be nice. There will be one more function call, but hopefully
that's not going to be a significant performance regression.
> I suppose some hardwired stuff for the case u >> v (not bitshift, the
> mathematical meaning of >>!) might want to be parameterised and also
> ideally tune/tuneup'ed.
Should that be done by gcd_1 only? Or do we need some variant of gcd_11
with an initial division?
> (16 seems like a huge default value, btw.)
My workstation (intel broadwell) uses takes 96 cycles for a division (if
I read https://gmplib.org/~tege/x86-timing.pdf, or is there are faster
64/64 div isntruction?). And gcd_11 runs at roughly 4 cycles per input
bit according to speed. So then threshold should be around 24.
Below patch to add a gcd_11 entrypoint for this arch. Passes make check,
but would be good to also test with devel/try.
Regards,
/Niels
diff -Nprc2 gmp-gcd_11.cdf9e11a028b/mpn/asm-defs.m4 gmp-gcd_11/mpn/asm-defs.m4
*** gmp-gcd_11.cdf9e11a028b/mpn/asm-defs.m4 2019-08-06 17:16:07.000000000 +0200
--- gmp-gcd_11/mpn/asm-defs.m4 2019-08-06 19:25:58.225354000 +0200
*************** define_mpn(dump)
*** 1395,1398 ****
--- 1395,1399 ----
define_mpn(gcd)
define_mpn(gcd_1)
+ define_mpn(gcd_11)
define_mpn(gcdext)
define_mpn(get_str)
diff -Nprc2 gmp-gcd_11.cdf9e11a028b/mpn/x86_64/core2/gcd_1.asm gmp-gcd_11/mpn/x86_64/core2/gcd_1.asm
*** gmp-gcd_11.cdf9e11a028b/mpn/x86_64/core2/gcd_1.asm 2019-08-06 17:16:07.000000000 +0200
--- gmp-gcd_11/mpn/x86_64/core2/gcd_1.asm 2019-08-06 19:25:58.225354000 +0200
*************** L(top): cmovc %r10, %rax C if x-y < 0
*** 137,141 ****
cmovc %r9, v0 C use x,y-x 0,3 0,3 2,8 1,7 1,7
L(mid): shr R8(%rcx), %rax C 1,7 1,6 2,8 2,8 2,8
! mov v0, %r10 C 1 1 4 3 3
sub %rax, %r10 C 2 2 5 4 4
bsf %r10, %rcx C 3 3 6 5 5
--- 137,141 ----
cmovc %r9, v0 C use x,y-x 0,3 0,3 2,8 1,7 1,7
L(mid): shr R8(%rcx), %rax C 1,7 1,6 2,8 2,8 2,8
! L(odd): mov v0, %r10 C 1 1 4 3 3
sub %rax, %r10 C 2 2 5 4 4
bsf %r10, %rcx C 3 3 6 5 5
*************** L(end): pop %rcx C common twos
*** 150,151 ****
--- 150,163 ----
ret
EPILOGUE()
+
+ TEXT
+ ALIGN(16)
+ PROLOGUE(mpn_gcd_11)
+ FUNC_ENTRY(2)
+ xor %ecx, %ecx
+ push %rcx
+ mov %rdi, %rax
+ mov %rsi, v0
+ jmp L(odd)
+ EPILOGUE()
+
--
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel
mailing list