# Improvements to powerpc32 asm code

**Torbjorn Granlund
**
tege@swox.com

*01 Jun 2003 15:57:58 +0200*

Mark Rodenkirch <mrodenkirch@wi.rr.com> writes:
Thanks. In my addmul_1.asm, the loop I have only handles one
limb per iteration, which makes the code much easier to follow
than the current implementation. I attempted both two and four
limbs per iteration, but I saw no gain from that.
If loop unrolling doesn't help, it should not be used, of course.
While small operands (few tens of limbs) are most important for
mpn_addmul_1, the speed for really large operands should not be
cmpletely ignored. Unrolling could be considered for that, since
it allows us to schedule loads further ahead of their use.
For mpn_add_n and mpn_sub_n, it is relatively common with huge
operands, so scheduling loads early is more important for these
functions.
I thought that GMP strongly avoided fp instructions in the integer code
because of rounding errors or does that only apply to FFT multiply?
GMP avoids FFT over complex numbers, since that involves
rounding. There are variants of that using fp instructions and
variants using integer instructions on fractions , and they share
the same rounding problem. (If somebody can prove that a certain
precision yields accurate results for the complex FFT, such code
might get into GMP.)
The mpn_addmul_N code using fp instructions doesn't do any rounding.
Multiplying am integer < 2^16 with another integer < 2^32 using fp,
provably yields the exact 48-bit product.
I have only found one resource on the net that discusses cycles
and latency of powerpc assembler instructions and it doesn't
include the G3 or G4. What do you use as a resource?
About 250 clicks nto http://e-www.motorola.com/ should give you
all manuals you need. Start by clicking Documentation near he
top of the page.
In future versions of GMP will there be subfolders in powerpc32
to allow for different versions dependent upon which flavor of
CPU one has?
It already does that. We currently only setup a special path for
the powerpc750. It should be straightforward to hack
configure.in (or configure directly if you don't have autoconf
handy) to setup path for more powerpcs.
--
Torbjörn