Improvements to powerpc32 asm code

Torbjorn Granlund
01 Jun 2003 15:57:58 +0200

Mark Rodenkirch <> writes:

  Thanks.  In my addmul_1.asm, the loop I have only handles one
  limb per iteration, which makes the code much easier to follow
  than the current implementation.  I attempted both two and four
  limbs per iteration, but I saw no gain from that.
If loop unrolling doesn't help, it should not be used, of course.

While small operands (few tens of limbs) are most important for
mpn_addmul_1, the speed for really large operands should not be
cmpletely ignored.  Unrolling could be considered for that, since
it allows us to schedule loads further ahead of their use.

For mpn_add_n and mpn_sub_n, it is relatively common with huge
operands, so scheduling loads early is more important for these

  I thought that GMP strongly avoided fp instructions in the integer code
  because of rounding errors or does that only apply to FFT multiply?
GMP avoids FFT over complex numbers, since that involves
rounding.  There are variants of that using fp instructions and
variants using integer instructions on fractions , and they share
the same rounding problem.  (If somebody can prove that a certain
precision yields accurate results for the complex FFT, such code
might get into GMP.)

The mpn_addmul_N code using fp instructions doesn't do any rounding.
Multiplying am integer < 2^16 with another integer < 2^32 using fp,
provably yields the exact 48-bit product.

  I have only found one resource on the net that discusses cycles
  and latency of powerpc assembler instructions and it doesn't
  include the G3 or G4.  What do you use as a resource?

About 250 clicks nto should give you
all manuals you need.  Start by clicking Documentation near he
top of the page.

  In future versions of GMP will there be subfolders in powerpc32
  to allow for different versions dependent upon which flavor of
  CPU one has?

It already does that.  We currently only setup a special path for
the powerpc750.  It should be straightforward to hack (or configure directly if you don't have autoconf
handy) to setup path for more powerpcs.