[PATCH] Optimize 32-bit sparc T1 multiply routines.

Sun Jan 6 08:40:20 CET 2013

David Miller <davem at davemloft.net> writes:

> I must be dense, but the implementation below doesn't work:

In which way does it fail? I suspect handling of input and output carry
is wrong (but it's long time since I tried any sparc assembly, so I have
forgotten most details).

> PROLOGUE(mpn_sub_nc)
> 	b,a	L(ent)
> EPILOGUE()
> PROLOGUE(mpn_sub_n)
> 	mov	0, cy
> L(ent):	cmp	%g0, cy

Does this subtract cy from zero, setting carry flag when cy > 0? That's
not correct, you should set the carry flag iff cy == 0.

> L(top):	ldx	[up+0], %o4
> 	add	up, 8, up
> 	ldx	[vp+0], %o5
> 	add	vp, 8, vp
> 	add	rp, 8, rp
> 	add	n, -1, n
> 	xnor	%o5, %g0, %o5
> 	addxccc	%o4, %o5, %g3
> 	brgz	n, L(top)
> 	 stx	%g3, [rp-8]
>
> 	retl
> 	addc	%g0, %g0, %o0
> EPILOGUE()
>
> Isn't it the case that this won't generate the correct
> overflow condition?  We need the inverse of the overflow
> bit this addxccc generates.

You should return one iff the carry flag is clear at the end of the
loop, so I agree the output carry handling is wrong too.

If you believe in the equation in a previous mail,

u - v = u + (B^n - 1 - v) + 1 - B^n
            Complement    cin   adjust cout

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.