GMP on Pentium 2

Kevin Ryde user42 at
Sat Nov 8 10:32:49 CET 2003

It looks like the carry flag is not separately renamed on p6, so using
decl serializes or something, costing at least 4 cycles.  I knew this
problem existed on p4 but wasn't aware of it on p6.

The athlon add_n has more unrolling than the x86 add_n used on p6, so
suffers less of these loop carry problems.

This might also explain why the p5 addmul_1 code runs so poorly on p6,
that code keeps a carry across the incl loop control.

Saving carry in a register and using subl seems to help a mock-up
loop.  Might have to go that way, or better scheduling, or lots more
unrolling.  Followups to gmp-devel if anyone has good ideas (actually
tested ideas, not random thoughts please :-).

More information about the gmp-discuss mailing list