Improvements to powerpc32 asm code
Kevin Ryde
user42@zip.com.au
Thu, 05 Jun 2003 10:50:34 +1000
Mark Rodenkirch <mrodenkirch@wi.rr.com> writes:
>
> I see that one of the tasks is the improve the mpn_add_n and mpn_sub_n
> (on powerpc32) to 3.25 cycles per limb. I have made some changes and
> am in the process of testing them. If someone else is already doing
> this, I will halt my effort.
I was tinkering with lshift and rshift, and got a straightforward 3.0
c/l loop for 7400 with smaller code than the current routines. I'd
hoped it would be 3.0 on 750 too, but turned out to be slower for some
reason I couldn't understand (renaming or completion no doubt). Main
loop below.
I might still add it in, just for the code size. Torbjorn has pointed
out that rlwimi (if that's the right insn) would allow perhaps 2.0
c/l, with separate code for each shift amount 1 to 31 bits.
L(top):
C r4 src, incrementing
C r5 dst, incrementing
C r6 shift
C r7 32-shift
C r8 src[i+1] << shift
C r9 src[i]
lwz r10, -4(r4)
srw r11, r9, r7
or r8, r8, r11
stw r8, -4(r5)
slw r8, r9, r6
bdz L(odd)
C r8 src[i+1] << shift
C r9
C r10 src[i]
lwzu r9, -8(r4)
srw r11, r10, r7
or r8, r8, r11
stwu r8, -8(r5)
slw r8, r10, r6
bdnz L(top)