[PATCH] Optimize 32-bit sparc T1 multiply routines.

David Miller davem at davemloft.net
Sun Jan 6 09:56:23 CET 2013

From: nisse at lysator.liu.se (Niels Möller)
Date: Sun, 06 Jan 2013 08:40:20 +0100

> David Miller <davem at davemloft.net> writes:
>> I must be dense, but the implementation below doesn't work:
> In which way does it fail? I suspect handling of input and output carry
> is wrong (but it's long time since I tried any sparc assembly, so I have
> forgotten most details).
>> PROLOGUE(mpn_sub_nc)
>> 	b,a	L(ent)
>> PROLOGUE(mpn_sub_n)
>> 	mov	0, cy
>> L(ent):	cmp	%g0, cy
> Does this subtract cy from zero, setting carry flag when cy > 0? That's
> not correct, you should set the carry flag iff cy == 0.
> You should return one iff the carry flag is clear at the end of the
> loop, so I agree the output carry handling is wrong too.

Thanks for your help, the following works.  I'll work on unrolling
and scheduling it.

	ba,pt	%xcc, L(ent)
	 xor	cy, 1, cy
	mov	1, cy
L(ent):	cmp	%g0, cy
L(top):	ldx	[up+0], %o4
	add	up, 8, up
	ldx	[vp+0], %o5
	add	vp, 8, vp
	add	rp, 8, rp
	add	n, -1, n
	xnor	%o5, %g0, %o5
	addxccc	%o4, %o5, %g3
	brgz	n, L(top)
	 stx	%g3, [rp-8]

	clr	%o0
	 movcc	%xcc, 1, %o0

More information about the gmp-devel mailing list