Improved mpn code for Core 2
Jason Martin
jason.worth.martin at gmail.com
Sat Dec 2 02:51:28 CET 2006
Hi All,
I've managed to improve the addmul_1 (and friends) mpn routines for
Core 2 processors.
My addmul_1 executes at 4.6 cycles/limb with a 4-way unroll of the
main loop and 4.3 cycles/limb with a 16-way unroll. I believe that
this is close to optimal for the Core 2 architecture. submul_1
behaves identically to addmul_1, and mul_1 executes at 4 cycles/limb.
This, together with some earlier changes to add_n and sub_n provide
for a GMPbench score of 8260 on my 2.66GHz Mac Pro, so it appears to
make quite a difference for the Core 2 architecture.
If you're interested, the code is available on my homepage:
http://www.math.jmu.edu/~martin
For those who asked: I've included an install routine that detects
the CPU and will only install the patches if a Core 2 CPU is found.
Hopefully this will allow you to add the patches into whatever
automatic build scripts you are using.
--jason
-----------------------------------------------------------
Jason Worth Martin
Asst. Prof. of Mathematics
James Madison University
http://www.math.jmu.edu/~martin
"Ever my heart rises as we draw near the mountains.
There is good rock here." -- Gimli, son of Gloin
More information about the gmp-devel
mailing list