[PATCH] Improve and consolidate sparc PIC assembler.

Sat Apr 13 15:40:38 CEST 2013

Torbjorn Granlund <tg at gmplib.org> writes:

  Torbjorn Granlund <tg at gmplib.org> writes:

    ld: fatal: relocation error: R_SPARC_GOTDATA_OP_LOX10: file mpn/.libs/gcd_1.o: symbol ctz_table: relocation illegal for TLS symbol
    ld: fatal: relocation error: R_SPARC_GOTDATA_OP: file mpn/.libs/gcd_1.o: symbol ctz_table: relocation illegal for TLS symbol

  There are also new check failures for a 32-bit sparc-solaris build:

  http://gmplib.org/devel/testmachines/check/failure/swift.nada.kth.se:32.txt

This is caused the changes to by sparc32/v9/sqr_diagonal.asm.

The last code used to use RDPC for PIC code, using the sequence,

.Lpc:   rd      %pc,%o7
        ld      [%o7+.Lnoll-.Lpc],%f8

while the new code uses the longer sequence,

        sethi   %hi(_GLOBAL_OFFSET_TABLE_-4), %l7
        call    __sparc_get_pc_thunk.l7
         or     %l7, %lo(_GLOBAL_OFFSET_TABLE_+4), %l7
        sethi   %gdop_hix22(.Lnoll), %l0
        xor     %l0, %gdop_lox10(.Lnoll), %l0
        ld      [%l7 + %l0], %l0, %gdop(.Lnoll)
        ld      [%l0], %f8

where the call is to a local function:

__sparc_get_pc_thunk.l7:
        retl
         add    %o7, %l7, %l7

Aside from that the new sequence (for to me unknown reasons) fails, it
is not clear why it would an improvement, had it worked.

Or in general, why should we not use RDPC always for PIC?

I spotted a comment in gcc,

;; Even on V9 we use this call sequence with a stub, instead of "rd %pc, ..."
;; because the RDPC instruction is extremely expensive and incurs a complete
;; instruction pipeline flush.

which perhaps answers my question.  But is that true in general or for
some sparcv9 implementations?  It would be nice to avoid these long
insns sequences where they can be avoided.

-- 
Torbjörn