Improvements for power64/mode64

Torbjorn Granlund tege at
Sun Mar 26 14:40:36 CEST 2006

Mark Rodenkirch <mgrogue at> writes:

  I would like to submit the following sources to replace
  addmul_1.asm  and submul_1.asm for the next release, whether 4.2
  or a patch for  4.2.  These sources take full advantage of the
  G5's pipeline.  I had  integrated these into GMP 4.1.4 early in
  2005 and have used them  extensively with GMP-ECM since then.
  With them I have found dozens  of new factors.
These contributions come too late for 4.2, hwoever much tested
they are.

Does addmul_1 really run at 10 cycles/limb, as comments is the
file say?  Then it is no faster than the current, simpler code.
Or did you not update the headers?  In hat case, what are the
cycle counts for your addmul and submul?


