Improvements to powerpc32 asm code
Torbjorn Granlund
tege@swox.com
05 Jun 2003 12:01:30 +0200
Kevin Ryde <user42@zip.com.au> writes:
I was tinkering with lshift and rshift, and got a straightforward 3.0
c/l loop for 7400 with smaller code than the current routines. I'd
hoped it would be 3.0 on 750 too, but turned out to be slower for some
reason I couldn't understand (renaming or completion no doubt). Main
loop below.
Please don't check in anything without timing tests on 745x too,
Its pipeline is completely different.
I might still add it in, just for the code size. Torbjorn has pointed
out that rlwimi (if that's the right insn) would allow perhaps 2.0
c/l, with separate code for each shift amount 1 to 31 bits.
That would surely be faster on many implementations since it
saves a shift insn. But 31 (63 for powerpc64) different inner
loops would hurt.
--
Torbjörn