I've ported GMP to Mac Pro. GMPbench > 7700
tege at swox.com
Sun Oct 15 14:58:16 CEST 2006
"Jason Martin" <jason.worth.martin at gmail.com> writes:
After everything is in cache and the limb count is high enough, I'm
getting 3 clock cycles/limb on Woodcrest and 3.5 clock cycles/limb on
Conroe. Note, however, that to test my code out on my Linux Conroe
box, I had to replace the lahf and sahf instructions with bt and setc
which seem to be a little slower (at least Agner Fog says so). I've
attached my testing code and timing routines so you can see exactly
what I'm doing.
To save the carry flag, use sbb or setc.
To restore it, use a plain add.
3 cycles/limb is much better than the present 13 or so cycles. But it
is possible to reach 2 cycles/limb with unrolling and jrcxz.
You might want to see how close to 4 cycles/limb you can get for a new
mpn_addmul_1 and friends. (The mulq instruction cannot be repeated
more than once every 4th cycles, so mpn_addmul_1 will never run better
than at 4 cycles/limb using mulq.)
More information about the gmp-devel