nisse at lysator.liu.se (Niels Möller) writes:

  In that case, not so surprising that the div1 function loses. Do other
  architectures also have decent performance for small-quotient division?

I don't have the full picture, I'm afraid.

I know several ARM cores have great division performance for small
quotients.  For x86 I know of cores with horrible performance and ones
(like Haswell and later) with half decent performance.  I assume newer
AMD cores got this right.

