Some secondary asm T3,T4,T5 functions

David Miller davem at
Tue Apr 2 06:43:18 CEST 2013

From: Torbjorn Granlund <tg at>
Date: Tue, 02 Apr 2013 05:00:59 +0200

> David Miller <davem at> writes:
>   > The code uses lzcnt, which I hope is implemented in T3 and T4.  I added
>   > it to the missing.m4 file, so that I could test the code on my old
>   > sparcs.
>   Just be forewarned, lzcnt is very slow, as slow as popc.
> I use both.  I use lzcnt in Euclidean norm division (in this batch
> mod_1_4) and popc in Hensel norm division (in this batch dive_1).
> The latter is for counting trailing zeros; it should be common to divide
> by odd numbers there, perhaps one should branch past popc in that case
> (andcc d, 1, %g0; bne L(skip_popc); mov 0, reg).
> It is harder to avoid lzcnt, since that will almost always be a
> non-trivial count.  I doubt there is a better way, like a binary seacrh
> over movcc.

Well, something like ffs() can be done using popc via:

	neg	val, tmp
	xnor	val, tmp, val
	popc	val, val

for example.

For shits and grins I attach a strlen that uses little endian loads
and popc on sparc64.  If both of those things ran efficiently, this
would be the best possible strlen on sparc v9.  Unfortunately, both
tend to be slow :-/

>   > More work is needed for loading table symbols.  I think most files do it
>   > properly, but at least sparct34-invert_limb.asm just assumes that a
>   > locally defined table is at a 32-bit address, and statically.
>   Of course, this won't work for all code models, you need to either use
>   a PIC sequence or something like "setx".
> Does setx expand in the assembler to the shortes allowed sequence under
> the code model?
> What's common here?  Does Linux, BSD, Solaris typically assume code
> lives < 2^32?

Static code should assume nothing about the code model, and assume the
worst case of the full 64-bits being significant in the address,
because it needs to be linkable in all code models.

For non-constant values, setx expands to full 64-bit decomposition,
as it should.  For constants, the GAS makes some effort to optimize,
but it doesn't do as well as the gcc backend does ;-)
-------------- next part --------------
/* Determine the length of a string.  For SPARC v9.
   Copyright (C) 1998, 1999, 2003, 2010 Free Software Foundation, Inc.
   This file is part of the GNU C Library.
   Contributed by Jan Vondrak <jvon4518 at>,
                  Jakub Jelinek <jj at>, and
                  David S. Miller <new at>.

   The GNU C Library is free software; you can redistribute it and/or
   modify it under the terms of the GNU Lesser General Public
   License as published by the Free Software Foundation; either
   version 2.1 of the License, or (at your option) any later version.

   The GNU C Library is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   Lesser General Public License for more details.

   You should have received a copy of the GNU Lesser General Public
   License along with the GNU C Library; if not, see
   <>.  */

#define ASI_PL		0x88

	.register	%g2, #scratch
	.register	%g3, #scratch

	.align		32
	.globl		strlen_new
	add	%o0, 8, %o1
	andn	%o0, 0x7, %o0

	ldxa	[%o0] ASI_PL, %o5
	and	%o1, 0x7, %g1
	mov	-1, %g5

	sethi	%hi(0x01010101), %o2
	sll	%g1, 3, %g1

	or	%o2, %lo(0x01010101), %o2
	sllx	%g5, %g1, %o3

	sllx	%o2, 32, %g1

	orn	%o5, %o3, %o5
	or	%o2, %g1, %o2

	sllx	%o2, 7, %o3
10:	add	%o0, 8, %o0

	andn	%o3, %o5, %g1	! HIGHBITS & ~x
	sub	%o5, %o2, %g2	! x - ONEBYTES

	andcc	%g1, %g2, %g1
	be,a,pt	%xcc, 10b
	 ldxa	[%o0] ASI_PL, %o5

	sub	%g1, 1, %g2
	andn	%g2, %g1, %g1
	popc	%g1, %g1
	srlx	%g1, 3, %g2

	add	%o0, %g2, %o0
	 sub	%o0, %o1, %o0

More information about the gmp-devel mailing list