Improvements to powerpc32 asm code
Mark Rodenkirch
mrodenkirch@wi.rr.com
Sun, 1 Jun 2003 19:52:11 -0500
On Sunday, June 1, 2003, at 08:57 AM, Torbjorn Granlund wrote:
> Mark Rodenkirch <mrodenkirch@wi.rr.com> writes:
>
> Thanks. In my addmul_1.asm, the loop I have only handles one
> limb per iteration, which makes the code much easier to follow
> than the current implementation. I attempted both two and four
> limbs per iteration, but I saw no gain from that.
>
> If loop unrolling doesn't help, it should not be used, of course.
I discovered my bug (I was not accounting for a carry) and once fixed
the code increased by 2 cycles/limb, making it perform worse than the
current code. I'll continue to investigate other options, but I expect
the
current version (or your revision) to be the best possible without FP
instructions.
The version of addmul_1.asm that you sent caused a segfault. I haven't
investigated why as I assume that you (or whomever wrote it) were
already
going to do that. Of course, if it doesn't cause a segfault on any PCs
that
you are using, then I will investigate further. I haven't looked at the
addmul_N.asm code that you sent.
I haven't tested my versions of add_n.asm or sub_n.asm with what GMP
provides. If you are interested in anything I provided in the versions
I
sent earlier, I will continue testing them.
BTW, keep up the good work on GMP. It's a great package.
--Mark