Small operands gcd improvements

Tue Aug 13 23:33:02 UTC 2019

Ciao,

Il Mer, 14 Agosto 2019 1:21 am, Torbjörn Granlund ha scritto:
> I saw this change go in:
>
> diff -r 118627eed635 -r bb86e66536d5 mpn/x86_64/coreihwl/gcd_11.asm
> --- a/mpn/x86_64/coreihwl/gcd_11.asm	Tue Aug 13 22:20:06 2019 +0200
> +++ b/mpn/x86_64/coreihwl/gcd_11.asm	Wed Aug 14 01:06:08 2019 +0200
> @@ -79,10 +79,10 @@
>
>  	ALIGN(16)		C
>  L(top):	bsf	v0, %rcx	C
> +	mov	u0, %r9		C
>  	sub	%rax, u0	C u - v
>  	cmovc	v0, u0		C u = |u - v|
>  	cmovc	%r9, %rax	C v = min(u,v)
> -	shrx(	%rcx, u0, %r9)	C
>  	shrx(	%rcx, u0, u0)	C
>  	mov	%rax, v0	C
>  	sub	u0, v0		C v - u
>
> What's the purpose of this change?

Failing tests :-)

> Did you time it on hwl, bwl, skl to make sure it's not slower than the
> changed code?

No.

> The double shrx was not a mistake; it sped things up quite a bit.
> (I use the same trick for zen and zen2.)

Yes, changing the loop was not the best idea, but I would have liked to
insert the correct code before the nightly tests...

Another possible solution, without changing the loop is:

PROLOGUE(mpn_gcd_11)
        FUNC_ENTRY(2)
        mov     v0, %rax        C
        sub     u0, v0          C
        jz      L(end)          C
        mov     u0, %r9         C set %r9

        ALIGN(16)               C
L(top): bsf     v0, %rcx        C
        sub     %rax, u0        C u - v
        cmovc   v0, u0          C u = |u - v|
        cmovc   %r9, %rax       C v = min(u,v)
        shrx(   %rcx, u0, %r9)  C
        shrx(   %rcx, u0, u0)   C
        mov     %rax, v0        C
        sub     u0, v0          C v - u
        jnz     L(top)          C

L(end): FUNC_EXIT()
        ret

Ĝis,
m

-- 
http://bodrato.it/papers/