Improvements to powerpc32 asm code
Sun, 1 Jun 2003 07:48:46 -0500
Thanks. In my addmul_1.asm, the loop I have only handles one limb per
iteration, which makes the code much easier to follow than the current
implementation. I attempted both two and four limbs per iteration, but
I saw no gain from that.
I thought that GMP strongly avoided fp instructions in the integer code
because of rounding errors or does that only apply to FFT multiply?
I have only found one resource on the net that discusses cycles and
latency of powerpc assembler instructions and it doesn't include the G3
or G4. What do you use as a resource?
In future versions of GMP will there be subfolders in powerpc32 to
allow for different versions dependent upon which flavor of CPU one has?