Some secondary asm T3,T4,T5 functions
davem at davemloft.net
Tue Apr 2 04:52:39 CEST 2013
From: Torbjorn Granlund <tg at gmplib.org>
Date: Tue, 02 Apr 2013 04:30:32 +0200
> Plain, non-pipelined version of bdiv_dbm1c.asm, mod_1_4.asm, mode1o.asm,
> dive_1.asm, invert_limb.asm.
> I wrote this with help of gcc, having first told longlong.h about
> umulxhi and addxc. Then I hand-optimised the result to varying degree.
> In no case did I software pipeline the loops, so these will rely on OoO
> execution for good speed.
> I believe this code is correct. If you could provide T3 and T4 timing
> numbers, that would be welcome. Or if you would optimise the lot, that
> would also be welcome.
I'll take a look at it, thanks!
> The code uses lzcnt, which I hope is implemented in T3 and T4. I added
> it to the missing.m4 file, so that I could test the code on my old
Just be forewarned, lzcnt is very slow, as slow as popc.
> More work is needed for loading table symbols. I think most files do it
> properly, but at least sparct34-invert_limb.asm just assumes that a
> locally defined table is at a 32-bit address, and statically.
Of course, this won't work for all code models, you need to either use
a PIC sequence or something like "setx".
More information about the gmp-devel