Improved mpn code for Core 2

Jason Martin jason.worth.martin at
Sat Dec 2 02:51:28 CET 2006

Hi All,

I've managed to improve the addmul_1 (and friends) mpn routines for
Core 2 processors.

My addmul_1 executes at 4.6 cycles/limb with a 4-way unroll of the
main loop and 4.3 cycles/limb with a 16-way unroll.  I believe that
this is close to optimal for the Core 2 architecture.  submul_1
behaves identically to addmul_1, and mul_1 executes at 4 cycles/limb.

This, together with some earlier changes to add_n and sub_n provide
for a GMPbench score of 8260 on my 2.66GHz Mac Pro, so it appears to
make quite a difference for the Core 2 architecture.

If you're interested, the code is available on my homepage:

For those who asked:  I've included an install routine that detects
the CPU and will only install the patches if a Core 2 CPU is found.
Hopefully this will allow you to add the patches into whatever
automatic build scripts you are using.


Jason Worth Martin
Asst. Prof. of Mathematics
James Madison University

"Ever my heart rises as we draw near the mountains.
There is good rock here." -- Gimli, son of Gloin

More information about the gmp-devel mailing list