Improvements for power64/mode64
Mark Rodenkirch
mgrogue at wi.rr.com
Sun Mar 26 15:28:59 CEST 2006
On Mar 26, 2006, at 6:40 AM, Torbjorn Granlund wrote:
> Mark Rodenkirch <mgrogue at wi.rr.com> writes:
>
> I would like to submit the following sources to replace
> addmul_1.asm and submul_1.asm for the next release, whether 4.2
> or a patch for 4.2. These sources take full advantage of the
> G5's pipeline. I had integrated these into GMP 4.1.4 early in
> 2005 and have used them extensively with GMP-ECM since then.
> With them I have found dozens of new factors.
>
> These contributions come too late for 4.2, hwoever much tested
> they are.
>
> Does addmul_1 really run at 10 cycles/limb, as comments is the
> file say? Then it is no faster than the current, simpler code.
> Or did you not update the headers? In hat case, what are the
> cycle counts for your addmul and submul?
I never modified the headers. I know that they are faster based upon
gmpbench as gmpbench gets a 20% improvement with my code.
Assuming I understand how to use speed correctly I am getting about
between 8.6 and 8.7 cycles per limb for both addmul and submul.
sqr_diagonal is between 7.8 and 7.9 cycles per limb. If you have a
single use of speed that lets both of us know that I am comparing
apples to apples, that would be great.
BTW, I would like to know where the 10 cycles per limb for addmul and
submul came from. Using the same values for speed I get between 13.4
and 13.5 cycles per limb for those. The original sqr_diagonal is
between 8.0 and 8.1. I think more improvements are possible with
sqr_diagonal. I just haven't worked on them.
--Mark
More information about the gmp-devel
mailing list