Improvements to powerpc32 asm code

Kevin Ryde
Thu, 05 Jun 2003 10:50:34 +1000

Mark Rodenkirch <> writes:
> I see that one of the tasks is the improve the mpn_add_n and mpn_sub_n
> (on powerpc32) to 3.25 cycles per limb.  I have made some changes and
> am in the process of testing them.  If someone else is already doing
> this, I will halt my effort.

I was tinkering with lshift and rshift, and got a straightforward 3.0
c/l loop for 7400 with smaller code than the current routines.  I'd
hoped it would be 3.0 on 750 too, but turned out to be slower for some
reason I couldn't understand (renaming or completion no doubt).  Main
loop below.

I might still add it in, just for the code size.  Torbjorn has pointed
out that rlwimi (if that's the right insn) would allow perhaps 2.0
c/l, with separate code for each shift amount 1 to 31 bits.

        C r4    src, incrementing
        C r5    dst, incrementing
        C r6    shift
        C r7    32-shift
        C r8    src[i+1] << shift
        C r9    src[i]

        lwz     r10, -4(r4)
        srw     r11, r9, r7

        or      r8, r8, r11
        stw     r8, -4(r5)

        slw     r8, r9, r6
        bdz     L(odd)

        C r8    src[i+1] << shift
        C r9
        C r10   src[i]

        lwzu    r9, -8(r4)
        srw     r11, r10, r7

        or      r8, r8, r11
        stwu    r8, -8(r5)

        slw     r8, r10, r6
        bdnz    L(top)