[PATCH] Improve and consolidate sparc PIC assembler.

Torbjorn Granlund tg at gmplib.org
Sun Apr 14 17:55:31 CEST 2013


David Miller <davem at davemloft.net> writes:

  So here is what I have right now.
  
  I guessed on the sqr_diagonal.asm failures on 32-bit
  Solaris/Sparc that something is wrong with INT32
  or W32 (which INT32 uses) on Solaris.
  
  Please give it a go.
  
  2013-04-13  David S. Miller  <davem at davemloft.net>
  
  	* mpn/sparc32/v9/sqr_diagonal.asm: Don't use INT32.
  	* mpn/sparc64/gcd_1.asm: Use RODATA, TYPE, and SIZE.
  
  diff -r 37082d27ed59 mpn/sparc32/v9/sqr_diagonal.asm
  --- a/mpn/sparc32/v9/sqr_diagonal.asm	Sat Apr 13 23:40:21 2013 +0200
  +++ b/mpn/sparc32/v9/sqr_diagonal.asm	Sat Apr 13 22:32:31 2013 -0700
  @@ -75,7 +75,9 @@
   ASM_START()
   	LEA_THUNK(l7)
   	TEXT
  -	INT32(noll, 0)
  +	ALIGN(4)
  +L(noll):
  +	.word	0
   PROLOGUE(mpn_sqr_diagonal)
   	save	%sp,-256,%sp
   
No improvement (it just replaces .long back to .word).

  diff -r 37082d27ed59 mpn/sparc64/gcd_1.asm
  --- a/mpn/sparc64/gcd_1.asm	Sat Apr 13 23:40:21 2013 +0200
  +++ b/mpn/sparc64/gcd_1.asm	Sat Apr 13 22:32:31 2013 -0700
  @@ -37,13 +37,14 @@
   deflit(MAXSHIFT, 7)
   deflit(MASK, eval((m4_lshift(1,MAXSHIFT))-1))
   
  -	.section	".rodata"
  +	RODATA
  +	TYPE(ctz_table,object)
   ctz_table:
   	.byte	MAXSHIFT
   forloop(i,1,MASK,
   `	.byte	m4_count_trailing_zeros(i)
   ')
  -
  +	SIZE(ctz_table,.-ctz_table)
   
   C Threshold of when to call bmod when U is one limb.  Should be about
   C (time_in_cycles(bmod_1,1) + call_overhead) / (cycles/bit).
  
I haven't tested that, but that will allow the build to finish, of
course, since we tried something equivalent yesterday.

I think we need to consider backing out some of the changes, to restore
GMP's function on sparc to non-GNU/Linux systems (and perhaps to
obsolete GNU/Linux systems).  We need to keep in mind the symbol
reference code was tried and tested, and worked on a broad range of
system.

Perhaps the old code appeared to do things a it haphazardly, which could
disturb ones sense of beauty...

The main aspect we should worry about is correct operation over a set of
environments, not just GNU/Linux.

The second most important aspect is performance.

Regularity and cleaness is also important, but only comes after the
other goals.

The current repo code doesn't build on Solaris ABI=64.  OK, we have a
fix for that now, but it does not restore correct operation, since 'make
check' still fails (for to me unknown reasons).

The current repo code fails make check because of
sparc32/v9/sqr_diagonal.asm problems (and perhaps other problems masked
by that problem) on Solaris ABI=32.

Once we can achieve correct operation, we should worry about
performance.  The code generated for the symbol reference of
sparc32/v9/sqr_diagonal is lots longer and not obviously faster.  So
RDPC is very slow, perhaps we should avoid it.  That does not mean that
we should generate a short text-relative reference via the GOT.  The old
code used

        rd      %pc, %o7
        ld      [%o7+.Lnoll-.Lpc],%f8

while the new code uses

        sethi   %hi(_GLOBAL_OFFSET_TABLE_-4), %l7
        call    __sparc_get_pc_thunk.l7
         or     %l7, %lo(_GLOBAL_OFFSET_TABLE_+4), %l7
        sethi   %gdop_hix22(.Lnoll), %l0
        xor     %l0, %gdop_lox10(.Lnoll), %l0
        ld      [%l7 + %l0], %l0, %gdop(.Lnoll)
        ld      [%l0], %f8

for putting zero in %f8.  If a __sparc_get_pc_thunk.l7 indeed beats 'rd
pc', then a plain assebly-time .Lnoll-.Lpc should hardly be replaced.

It might not be kosher to put local objects in the text segment, but it
allows for fast code, and it seems quite portable.

We cannot always do that, of course, since some tables come from C code
which will use rodata.

-- 
Torbjörn


More information about the gmp-devel mailing list