Improvements to powerpc32 asm code

Torbjorn Granlund
05 Jun 2003 12:01:30 +0200

Kevin Ryde <> writes:

  I was tinkering with lshift and rshift, and got a straightforward 3.0
  c/l loop for 7400 with smaller code than the current routines.  I'd
  hoped it would be 3.0 on 750 too, but turned out to be slower for some
  reason I couldn't understand (renaming or completion no doubt).  Main
  loop below.
Please don't check in anything without timing tests on 745x too,
Its pipeline is completely different.

  I might still add it in, just for the code size.  Torbjorn has pointed
  out that rlwimi (if that's the right insn) would allow perhaps 2.0
  c/l, with separate code for each shift amount 1 to 31 bits.
That would surely be faster on many implementations since it
saves a shift insn.  But 31 (63 for powerpc64) different inner
loops would hurt.