[PATCH] Optimize 32-bit sparc T1 multiply routines.
nisse at lysator.liu.se
Sun Jan 6 08:40:20 CET 2013
David Miller <davem at davemloft.net> writes:
> I must be dense, but the implementation below doesn't work:
In which way does it fail? I suspect handling of input and output carry
is wrong (but it's long time since I tried any sparc assembly, so I have
forgotten most details).
> b,a L(ent)
> mov 0, cy
> L(ent): cmp %g0, cy
Does this subtract cy from zero, setting carry flag when cy > 0? That's
not correct, you should set the carry flag iff cy == 0.
> L(top): ldx [up+0], %o4
> add up, 8, up
> ldx [vp+0], %o5
> add vp, 8, vp
> add rp, 8, rp
> add n, -1, n
> xnor %o5, %g0, %o5
> addxccc %o4, %o5, %g3
> brgz n, L(top)
> stx %g3, [rp-8]
> addc %g0, %g0, %o0
> Isn't it the case that this won't generate the correct
> overflow condition? We need the inverse of the overflow
> bit this addxccc generates.
You should return one iff the carry flag is clear at the end of the
loop, so I agree the output carry handling is wrong too.
If you believe in the equation in a previous mail,
u - v = u + (B^n - 1 - v) + 1 - B^n
Complement cin adjust cout
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel