[PATCH] T3/T4 sparc shifts, plus more timings
David Miller
davem at davemloft.net
Tue Mar 26 21:40:58 CET 2013
From: Torbjorn Granlund <tg at gmplib.org>
Date: Tue, 26 Mar 2013 21:18:26 +0100
> David Miller <davem at davemloft.net> writes:
>
> L(top):
> or %g4, %g1, %l1
> sllx %g2, cnt, %g1
>
> srlx %g2, tcnt, %g4
> ldx [up - 8], %g2
>
> stx %l1, [rp - 8]
> or %g3, %l2, %l7
>
> sllx %g5, cnt, %l2
> srlx %g5, tcnt, %g3
>
> ldx [up - 16], %g5
> sub up, 16, up
>
> stx %l7, [rp - 16]
> sub rp, 16, rp
>
> brgz n, L(top)
> add n, -2, n
>
> It has lost some symmetry, which would be nice to keep. Is it slower
> in the operation order I suggested?
In what was has symmetry been lost? For odd modulus of 'n' we can
branch to the first instruction after the first store in the loop, and
it should work just fine.
The only thing I did was transpose some "or/sllx" pairs, I tried to
keep the major blocks grouped the same.
More information about the gmp-devel
mailing list