Improvements to powerpc32 asm code

Mark Rodenkirch mrodenkirch@wi.rr.com
Sun, 1 Jun 2003 19:52:11 -0500


On Sunday, June 1, 2003, at 08:57 AM, Torbjorn Granlund wrote:

> Mark Rodenkirch <mrodenkirch@wi.rr.com> writes:
>
>   Thanks.  In my addmul_1.asm, the loop I have only handles one
>   limb per iteration, which makes the code much easier to follow
>   than the current implementation.  I attempted both two and four
>   limbs per iteration, but I saw no gain from that.
>
> If loop unrolling doesn't help, it should not be used, of course.

I discovered my bug (I was not accounting for a carry) and once fixed
the code increased by 2 cycles/limb, making it perform worse than the
current code.  I'll continue to investigate other options, but I expect 
the
current version (or your revision) to be the best possible without FP
instructions.

The version of addmul_1.asm that you sent caused a segfault.  I haven't
investigated why as I assume that you (or whomever wrote it) were 
already
going to do that.  Of course, if it doesn't cause a segfault on any PCs 
that
you are using, then I will investigate further.  I haven't looked at the
addmul_N.asm code that you sent.

I haven't tested my versions of add_n.asm or sub_n.asm with what GMP
provides.  If you are interested in anything I provided in the versions 
I
sent earlier, I will continue testing them.

BTW, keep up the good work on GMP.  It's a great package.

--Mark