[PATCH] Optimize 32-bit sparc T1 multiply routines.
David Miller
davem at davemloft.net
Sun Jan 6 09:56:23 CET 2013
From: nisse at lysator.liu.se (Niels Möller)
Date: Sun, 06 Jan 2013 08:40:20 +0100
> David Miller <davem at davemloft.net> writes:
>
>> I must be dense, but the implementation below doesn't work:
>
> In which way does it fail? I suspect handling of input and output carry
> is wrong (but it's long time since I tried any sparc assembly, so I have
> forgotten most details).
>
>> PROLOGUE(mpn_sub_nc)
>> b,a L(ent)
>> EPILOGUE()
>> PROLOGUE(mpn_sub_n)
>> mov 0, cy
>> L(ent): cmp %g0, cy
>
> Does this subtract cy from zero, setting carry flag when cy > 0? That's
> not correct, you should set the carry flag iff cy == 0.
...
> You should return one iff the carry flag is clear at the end of the
> loop, so I agree the output carry handling is wrong too.
Thanks for your help, the following works. I'll work on unrolling
and scheduling it.
PROLOGUE(mpn_sub_nc)
ba,pt %xcc, L(ent)
xor cy, 1, cy
EPILOGUE()
PROLOGUE(mpn_sub_n)
mov 1, cy
L(ent): cmp %g0, cy
L(top): ldx [up+0], %o4
add up, 8, up
ldx [vp+0], %o5
add vp, 8, vp
add rp, 8, rp
add n, -1, n
xnor %o5, %g0, %o5
addxccc %o4, %o5, %g3
brgz n, L(top)
stx %g3, [rp-8]
clr %o0
retl
movcc %xcc, 1, %o0
EPILOGUE()
More information about the gmp-devel
mailing list