C implementation of mod_1_1

Niels Möller nisse at lysator.liu.se
Thu Mar 3 12:29:07 CET 2011


Torbjorn Granlund <tg at gmplib.org> writes:

> Yep, but it also does not do any harm.  GCC simply ignores aditional
> such claims of commutative after the initial one.  If GCC someday is
> improved, fine.

Both gcc documentation and comments on longlong.h claim that multiple %
can make gcc fail...

>   /* FIXME: Needs review and/or testing. */
>   #if 0 && defined (__sparc__) && W_TYPE_SIZE == 64
>   #define add_mssaaaa(m, sh, sl, ah, al, bh, bl)				\
>     __asm__ (								\
>         "addcc	%r5,%6,%2\n"						\
>         "	addccc	%r7,%8,%%g0\n"						\
>         "	addc	%r3,%4,%1\n"						\
>         "	clr	%0\n"							\
>         "	movcs	%%xcc, -1, %0\n"					\
>          : "=r" (m),"=r" (sh), "=&r" (sl)					\
>          : "rJ" (ah), "rI" (bh), "%rJ" (al), "rI" (bl),			\
>   	 "rJ" ((al) >> 32), "rI" ((bl) >> 32),				\
>   	 __CLOBBER_CC)
>   #endif
>   
> I cannot see how carry-out from ah.bh works.

Hmm, the final addc should be an addccc. Does it look corrrect then?

>     3. For powerpc, I have a strange matching constraint between s0 and a0
>        (inherited from the add_sssaaaa you gave me). I can guess what the
>        %I6c is intended to do, is that related? The add_ssaaaa in longlong.h
>        uses some different tricks, but no matching constraint.
>   
> II6 generates an i if the operand is immediate.

>   /* FIXME: Needs review and/or testing. I don't understand why
>      constraints says s0 (%2) and a0 (%6) must share a register. */
>   #if 0 && HAVE_HOST_CPU_FAMILY_powerpc && W_TYPE_SIZE == 64
>   #define add_mssaaaa(m, s1, s0, a1, a0, b1, b0)                         \
>     __asm__ (								\
>          "add%I6c	%2,%5,%6\n"						\
>         "	adde	%1,%3,%4\n"						\
>         "	subfe	%0,%0,%0\n"						\
>         "	neg	%0, %0"							\
>   	   : "=r" (m), "=r" (s1), "=&r" (s0)                          \
>   	   : "r"  ((UDItype)(a1)), "r" ((UDItype)(b1)),                 \
>   	     "%2" ((UDItype)(a0)), "rI" ((UDItype)(b0)))
>   #endif
>
> This cannot be right.  Replace the neg with a subfi %0,%0,-1.

You're right, thanks for spotting this. According to the docs I
found, subfe rt, ra, rb sets

  rt <-- ~ra + rb + ca

Now, if ra and rb are the same, ~r1 + rb = -1, so we get:

ca = 1: 0
ca = 0: -1

This mask must be complemented, subfic %0,%0,-1 should be the same as
nor %0,%0,%0 (except for the effect on the carry out). And I see only
subfic in the docs, not subfi, but it doesn't matter since we've
clobbered the carry flag anyway. (Hmm, on powerpc we don't need any
explicit clobber statement for the carry?)

And what about the constraint 

  "%2" ((UDItype)(a0))

is that really needed, and why? It says that a0 input and s0 output must
be put in the same register.

/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.


More information about the gmp-devel mailing list