rshift for 64bit Core2

Torbjorn Granlund tg at swox.com
Sat Mar 22 15:00:31 CET 2008


nisse at lysator.liu.se (Niels Möller) writes:

  Peter Cordes <peter at cordes.ca> writes:
  
  > More unrolling makes the intro loop even worse: it can
  > run up to 7 times, instead of 3.
  
  An alternative way of organizing the intro is like
  
    if (n & 1)
      {
        /* one iteration */
        n --;
      }
    if (n & 2)
      {
        /* two iterations */
        n -= 2;
      }
    if (n & 4)
      {
        /* four iterations */
        n -= 4;
      }
  
    /* Now n is a multiple of 8. */
    for (; n > 0; n -= 8)
      {
        /* Main loop, 8 iterations at a time */
      }
  
  Not sure how Torbjörn usually does things.
  
  But I guess the above might be worse than a plain intro loop due to
  bad branch predictability...

I think the opposite, actually.  With a loop, branch prediction for
the loop branch will need to say "taken" say 5 times, and then the 6th
time "not taken", and on next invocation with the same n, it needs to
be predicted taken immediately again.  This requires a complex branch
predictor that can match a pattern of taken and non-taken (taken 5
times, non-taken, taken 5 times, non-taken ...).

With your variant above, one may get 100% good predictions with any
branch predictor (assuming consecutive invocation use the same n, of
course).

-- 
Torbjörn


More information about the gmp-devel mailing list