[PATCH] Optimize 32-bit sparc T1 multiply routines.

Sun Jan 6 06:00:24 CET 2013

From: Torbjorn Granlund <tg at gmplib.org>
Date: Sun, 06 Jan 2013 01:25:10 +0100

> For sub_n, I suppose
> 
>     ldx
>     ldx
>     xnor (with %g0)
>     addxcc
>     stx
> 
> would be the right mix.

I must be dense, but the implementation below doesn't work:

PROLOGUE(mpn_sub_nc)
	b,a	L(ent)
EPILOGUE()
PROLOGUE(mpn_sub_n)
	mov	0, cy
L(ent):	cmp	%g0, cy
L(top):	ldx	[up+0], %o4
	add	up, 8, up
	ldx	[vp+0], %o5
	add	vp, 8, vp
	add	rp, 8, rp
	add	n, -1, n
	xnor	%o5, %g0, %o5
	addxccc	%o4, %o5, %g3
	brgz	n, L(top)
	 stx	%g3, [rp-8]

	retl
	addc	%g0, %g0, %o0
EPILOGUE()

Isn't it the case that this won't generate the correct
overflow condition?  We need the inverse of the overflow
bit this addxccc generates.