Some secondary asm T3,T4,T5 functions

Tue Apr 2 05:00:59 CEST 2013

David Miller <davem at davemloft.net> writes:

  > The code uses lzcnt, which I hope is implemented in T3 and T4.  I added
  > it to the missing.m4 file, so that I could test the code on my old
  > sparcs.

  Just be forewarned, lzcnt is very slow, as slow as popc.

I use both.  I use lzcnt in Euclidean norm division (in this batch
mod_1_4) and popc in Hensel norm division (in this batch dive_1).
The latter is for counting trailing zeros; it should be common to divide
by odd numbers there, perhaps one should branch past popc in that case
(andcc d, 1, %g0; bne L(skip_popc); mov 0, reg).

It is harder to avoid lzcnt, since that will almost always be a
non-trivial count.  I doubt there is a better way, like a binary seacrh
over movcc.

  > More work is needed for loading table symbols.  I think most files do it
  > properly, but at least sparct34-invert_limb.asm just assumes that a
  > locally defined table is at a 32-bit address, and statically.

  Of course, this won't work for all code models, you need to either use
  a PIC sequence or something like "setx".

Does setx expand in the assembler to the shortes allowed sequence under
the code model?

What's common here?  Does Linux, BSD, Solaris typically assume code
lives < 2^32?

-- 
Torbjörn