[PATCH] T3/T4 sparc shifts, plus more timings
Torbjorn Granlund
tg at gmplib.org
Wed Mar 27 22:07:47 CET 2013
David Miller <davem at davemloft.net> writes:
As an aside I think we can get it down to 2.5 cycles per limb on
T4 with 4-way unrolling, and 3.0 cycles per limb with 2-way
unrolling.
The idea is to decrease the bookkeeping instructions by only
maintaining base pointers which do not change, and then we have an
offset which operates as the loop index.
So we'd instead have an 'n_off' instead of 'n', and then in some local
registers we'd hold:
l3: up - 8
l4: up - 16
l5: rp - 8
l6: rp - 16
A clever trick! But you will probably get 2.75 c/l for 4-way, not 2.5
c/l. We'll need infinite unrolling for 2.5...
--
Torbjörn
More information about the gmp-devel
mailing list