rshift for 64bit Core2
tg at swox.com
Sat Mar 22 15:00:31 CET 2008
nisse at lysator.liu.se (Niels Möller) writes:
Peter Cordes <peter at cordes.ca> writes:
> More unrolling makes the intro loop even worse: it can
> run up to 7 times, instead of 3.
An alternative way of organizing the intro is like
if (n & 1)
/* one iteration */
if (n & 2)
/* two iterations */
n -= 2;
if (n & 4)
/* four iterations */
n -= 4;
/* Now n is a multiple of 8. */
for (; n > 0; n -= 8)
/* Main loop, 8 iterations at a time */
Not sure how Torbjörn usually does things.
But I guess the above might be worse than a plain intro loop due to
bad branch predictability...
I think the opposite, actually. With a loop, branch prediction for
the loop branch will need to say "taken" say 5 times, and then the 6th
time "not taken", and on next invocation with the same n, it needs to
be predicted taken immediately again. This requires a complex branch
predictor that can match a pattern of taken and non-taken (taken 5
times, non-taken, taken 5 times, non-taken ...).
With your variant above, one may get 100% good predictions with any
branch predictor (assuming consecutive invocation use the same n, of
More information about the gmp-devel