Small operands gcd improvements
Marco Bodrato
bodrato at mail.dm.unipi.it
Tue Aug 13 23:33:02 UTC 2019
Ciao,
Il Mer, 14 Agosto 2019 1:21 am, Torbjörn Granlund ha scritto:
> I saw this change go in:
>
> diff -r 118627eed635 -r bb86e66536d5 mpn/x86_64/coreihwl/gcd_11.asm
> --- a/mpn/x86_64/coreihwl/gcd_11.asm Tue Aug 13 22:20:06 2019 +0200
> +++ b/mpn/x86_64/coreihwl/gcd_11.asm Wed Aug 14 01:06:08 2019 +0200
> @@ -79,10 +79,10 @@
>
> ALIGN(16) C
> L(top): bsf v0, %rcx C
> + mov u0, %r9 C
> sub %rax, u0 C u - v
> cmovc v0, u0 C u = |u - v|
> cmovc %r9, %rax C v = min(u,v)
> - shrx( %rcx, u0, %r9) C
> shrx( %rcx, u0, u0) C
> mov %rax, v0 C
> sub u0, v0 C v - u
>
> What's the purpose of this change?
Failing tests :-)
> Did you time it on hwl, bwl, skl to make sure it's not slower than the
> changed code?
No.
> The double shrx was not a mistake; it sped things up quite a bit.
> (I use the same trick for zen and zen2.)
Yes, changing the loop was not the best idea, but I would have liked to
insert the correct code before the nightly tests...
Another possible solution, without changing the loop is:
PROLOGUE(mpn_gcd_11)
FUNC_ENTRY(2)
mov v0, %rax C
sub u0, v0 C
jz L(end) C
mov u0, %r9 C set %r9
ALIGN(16) C
L(top): bsf v0, %rcx C
sub %rax, u0 C u - v
cmovc v0, u0 C u = |u - v|
cmovc %r9, %rax C v = min(u,v)
shrx( %rcx, u0, %r9) C
shrx( %rcx, u0, u0) C
mov %rax, v0 C
sub u0, v0 C v - u
jnz L(top) C
L(end): FUNC_EXIT()
ret
Ĝis,
m
--
http://bodrato.it/papers/
More information about the gmp-devel
mailing list