Some secondary asm T3,T4,T5 functions

David Miller davem at
Tue Apr 2 04:52:39 CEST 2013

From: Torbjorn Granlund <tg at>
Date: Tue, 02 Apr 2013 04:30:32 +0200

> Plain, non-pipelined version of bdiv_dbm1c.asm, mod_1_4.asm, mode1o.asm,
> dive_1.asm, invert_limb.asm.
> I wrote this with help of gcc, having first told longlong.h about
> umulxhi and addxc.  Then I hand-optimised the result to varying degree.
> In no case did I software pipeline the loops, so these will rely on OoO
> execution for good speed.
> I believe this code is correct.  If you could provide T3 and T4 timing
> numbers, that would be welcome.  Or if you would optimise the lot, that
> would also be welcome.

I'll take a look at it, thanks!

> The code uses lzcnt, which I hope is implemented in T3 and T4.  I added
> it to the missing.m4 file, so that I could test the code on my old
> sparcs.

Just be forewarned, lzcnt is very slow, as slow as popc.

> More work is needed for loading table symbols.  I think most files do it
> properly, but at least sparct34-invert_limb.asm just assumes that a
> locally defined table is at a 32-bit address, and statically.

Of course, this won't work for all code models, you need to either use
a PIC sequence or something like "setx".

More information about the gmp-devel mailing list