[PATCH] T3/T4 sparc shifts, plus more timings

David Miller davem at davemloft.net
Tue Mar 26 21:40:58 CET 2013


From: Torbjorn Granlund <tg at gmplib.org>
Date: Tue, 26 Mar 2013 21:18:26 +0100

> David Miller <davem at davemloft.net> writes:
> 
>   L(top):
>           or      %g4, %g1, %l1
>           sllx    %g2, cnt, %g1
>   
>           srlx    %g2, tcnt, %g4
>           ldx     [up - 8], %g2
>   
>           stx     %l1, [rp - 8]
>           or      %g3, %l2, %l7
>   
>           sllx    %g5, cnt, %l2
>           srlx    %g5, tcnt, %g3
>   
>           ldx     [up - 16], %g5
>           sub     up, 16, up
>   
>           stx     %l7, [rp - 16]
>           sub     rp, 16, rp
>   
>           brgz    n, L(top)
>            add    n, -2, n
>   
> It has lost some symmetry, which would be nice to keep.  Is it slower
> in the operation order I suggested?

In what was has symmetry been lost?  For odd modulus of 'n' we can
branch to the first instruction after the first store in the loop, and
it should work just fine.

The only thing I did was transpose some "or/sllx" pairs, I tried to
keep the major blocks grouped the same.


More information about the gmp-devel mailing list