[PATCH] T3/T4 sparc shifts, plus more timings
David Miller
davem at davemloft.net
Sun Mar 31 19:36:35 CEST 2013
From: Torbjorn Granlund <tg at gmplib.org>
Date: Sun, 31 Mar 2013 05:03:10 +0200
> The lshiftc code runs at 3 c/l on US3, not the claimed 2.5 c/l. I
> suspect also the US1 claim if 2 c/l is invalid.
So I've solved the puzzle of why we get 3 c/l for lshiftc on US3.
If the loop executes one time, it runs in 5 cycles.
But if it executes more than one time, each iteration after the
first executes in 6 cycles.
The reason is that the final cycle:
srlx u0, tcnt, %l4
bge,pt %xcc, L(top)
stx r1, [n + r1_off] C WAS: rp - 16
schedules in the first instruction of the loop into that final
instruction group:
L(top):
not %l3, %l3
breaking our carefully scheduled code.
I'm going to play around with some things to try and fix this.
Interestingly, UltraSPARC-1 and UltraSPARC-2 would not group the
final cycle of the loop this way, because of it's requirement that
integer operations must occur in the first three instructions of
a group.
More information about the gmp-devel
mailing list