rshift for 64bit Core2

Niels Möller nisse at lysator.liu.se
Sat Mar 22 14:01:57 CET 2008


Peter Cordes <peter at cordes.ca> writes:

> More unrolling makes the intro loop even worse: it can
> run up to 7 times, instead of 3.

An alternative way of organizing the intro is like

  if (n & 1)
    {
      /* one iteration */
      n --;
    }
  if (n & 2)
    {
      /* two iterations */
      n -= 2;
    }
  if (n & 4)
    {
      /* four iterations */
      n -= 4;
    }

  /* Now n is a multiple of 8. */
  for (; n > 0; n -= 8)
    {
      /* Main loop, 8 iterations at a time */
    }

Not sure how Torbjörn usually does things.

But I guess the above might be worse than a plain intro loop due to
bad branch predictability... The above has three branch instructions
regardless of input, each taken more or less randomly, i.e., 1.5 well
predicted and 1.5 mispredicted branches. While a loop that runs 0-7
rounds depending on input will have 3.5 rounds on average, with 1
mispredicted branch and 2.5 well predicted ones, assuming the simple
prediction of always running the loop one more time.

/Niels


More information about the gmp-devel mailing list