| GMPbench results
|
GMPbench results. We have run the benchmark on the highest-frequency
machine of each type to which we have convenient access. Scaling to lower or
higher frequencies should work well, since GMP mainly work off the caches.
The GMPbench suite is available for download
here. To run the benchmarks, you
also need to compile gexpr.c and put it somewhere in
your path.
GMP exercises a particular set of processor capabilities, widening integer
multiplication being the most important one. Processors with poor integer
multiply support get worse scores on GMPbench than on other benchmarks.
GMPbench 0.2 isn't a perfect benchmarks suite for GMP, but it is much
better than GMPbench 0.1.
GMPbench 0.1 measures multiplication with same-size operands, and division,
and then RSA encryption. That's it. GMP typically performs a whole lot more
operations which are not measured at all. Furthermore, the RSA figures weigh
too much in the score computation.
GMPbench 0.2 adds squaring, multiplication of different-size operands, more
variants of division, gcd and gcdext, all in the "base" hierarchy, and pi
computation in the "app" hierarchy.
The added benchmarks of GMPbench 0.2 aren't totally fair to older GMP
versions; we added measurements of things we thought were important, and then
used these to drive generic improvements of GMP 4.3. We sort of
tuned-for-the-benchmark, but since we chose the benchmarks to make sense, it
isn't as silly as such an exercise would usually be.
GMP 4.3.x GMPbench 0.2 results
| CPU
|
|
| Compiler/Compilation flags
| | base | | multiply | divide | gcd | gcdext |
|
| GMP bench
| Score/ GHz
|
| Opteron/Athlon64 K10 6MB L3 | 3200 | 64 | "gcc 4.3.3" -O2 -m64 -mtune=k8
| 39826 | 24748 | 5953 | 3729
| 4956 | 37
| 2669 | 834
|
| Core 2 E6400 (65nm) | 2133 | 64 | "gcc 4.2.1" -O2 -m64
| 17981 | 10656 | 3312 | 2035
| 2330 | 18
| 1276 | 598
|
| Itanium 2 | 1300 | 64 | "gcc 4.2.1" -O2 -m64
| 15578 | 7983 | 2260 | 1342
| 1294 | 14.6
| 909 | 699
|
| PowerPC 970 (G5) | 2700 | 64 | "gcc 4.0.1-5367" -O3 -m64 -mcpu=970
| 12364 | 9349 | 2412 | 1489
| 1577 | 14
| 946 | 350
|
| Pentium 4 | 3200 | 64 | "gcc 4.2.1" -O2 -m64
| 10975 | 6968 | 1938 | 1315
| 1468 | 11.3
| 799 | 250
|
| Pentium 4 Northwood | 2600 | 32 | "gcc 4.2.1" -O2 -fomit-frame-pointer -march=pentium4
| 5162 | 2874 | 1163 | 707
| 654 | 6.5
| 394 | 152
|
| UltraSPARC 3 | 1593 | 64 | "gcc 3.4.4" -O2 -m64
| 3733 | 2488 | 956 | 533
| 370 | 5.1
| 286 | 180
|
GMP 4.3.x GMPbench 0.1 results
| CPU
|
|
| Compiler/Compilation flags
|
|
| GMPbench
| Score/ GHz
|
|
| Opteron/Athlon64 K10 | 2300 | 64 | "gcc 3.4.3" -O2 -m64 -mtune=k8
| 81633 | 42278 | 3606 | 14554 | 6328 | 26000 @ 3.2GHz
|
| Opteron/Athlon64 K8/K9 | 2200 | 64 | "gcc 3.4.6" -O2 -m64 -mtune=k8
| 69279 | 40081 | 3232 | 13050 | 5932 | 24000 @ 3.2GHz
|
| Core 2 E6400 (65nm) | 2133 | 64 | "gcc 4.2.1" -O2 -m64 -mtune=k8
| 51519 | 24316 | 2314 | 9050 | 4249 | 16000 @ 3.33GHz
|
| Pentium 4 | 3200 | 64 | "gcc 3.4.4" -O3 -m64 -mtune=k8
| 31259 | 16412 | 1427 | 5685 | 1777 | 7000 @ 3.8GHz
|
| PowerPC 970 (G5) | 1600 | 64 | "gcc 4.0.1 build 5367" -mcpu=970 -O3
| 22119 | 12198 | 916 | 3880 | 2425 |
|
| Alpha 21264 | | 64 |
| | | | | |
|
| Athlon XP | | 32 |
| | | | | |
|
| Pentium 4 Prescott | | 32 |
| | | | | |
|
| Pentium 4 Northwood | 2600 | 32 | "gcc 3.4.6" -O2 -fomit-frame-pointer -march=pentium4
| 16133 | 6726 | 680 | 2661 | 1023 |
|
| Pentium 3 / Pentium M | | 32 |
| | | | | |
|
| Atom | 1600 | 64 | "gcc 4.2.1" -O3 -m64 -mtune=k8
| 12471 | 6940 | 457 | 2063 | 1289 |
|
| UltraSPARC 3 | 1593 | 64 |
| 11066 | 5942 | 370 | 1732 | |
|
| PowerPC 7447 (G4) | | 32 |
| | | | | |
|
| Alpha 21164A | | 64 |
| | | | | |
|
Notes:
- These results are preliminary and based on a snapshot of what
will become GMP 4.3. Final results should be somewhat better for
certain processors.
- The clock frequencies for the above measures are not the same as for GMP
4.2, since we didn't have access to the same hardware. However, we have
remeasured some of the 4.2 numbers and updated the table below.
- The last column, "Optimal", is an estimate of what could be attained by
writing optimized assembly code for this processor.
GMP 4.2.x GMPbench 0.1 results
| CPU
|
|
| Compiler/Compilation flags
|
|
| GMPbench
| Score/ GHz
|
|
| Opteron/Athlon64 K10 | 2300 | 64 | "gcc 3.4.3" -O2 -m64 -mtune=k8
| 43473 | 23880 | 2178 | 8377 | 3642 |
|
| Opteron/Athlon64 K8/K9 | 2200 | 64 | "gcc 3.4.6" -O2 -m64 -mtune=k8
| 38362 | 21621 | 1979 | 7549 | 3431 | 20000 @ 3.2GHz
|
| Core 2 E6400 (65nm) | 2133 | 64 | "gcc 4.2.1" -O2 -m64 -mtune=k8
| 36902 | 20330 | 2092 | 7570 | 2523 | 12000 @ 3.33GHz
|
| PowerPC 970 (G5) | 2700 | 64 | "gcc 4.0.1 build 5367" -mcpu=970 -fast
| 27740 | 16500 | 1409 | 5490 | 2033 | 7500 @ 2.7GHz
|
| Pentium 4 | 3200 | 64 | "gcc 3.4.4" -O2 -m64 -mtune=k8
| 19425 | 10525 | 929 | 3645 | 1139 | 5000 @ 3.8GHz
|
| Alpha 21264 | 1000 | 64 | "gcc 4.1.2" -O3 -mcpu=ev67
| 18703 | 11272 | 913 | 3641 | 3641 | 6000 @ 1.25GHz
|
| Itanium 2 | 1600 | 64 | "gcc 4.1.1" -O3 -mtune=itanium2
| 19744 | 10340 | 799 | 3379 | 2112 | 13000 @ 1.6GHz
|
| Athlon XP | 2083 | 32 | "gcc 4.0.2" -O2 -fomit-frame-pointer
| 15682 | 7902 | 624 | 2636 | 1265 |
|
| Pentium 4 Prescott | 3000 | 32 | "gcc 4.0.2" -O2 -fomit-frame-pointer -march=pentium4
| 15123 | 6189 | 675 | 2556 | | 4000 @ 3.8GHz
|
| Pentium 4 Northwood | 2600 | 32 | "gcc 3.4.6" -O2 -fomit-frame-pointer -march=pentium4
| 14111 | 5468 | 569 | 2236 | | 3500 @ 3.4GHz
|
| Pentium 3 / Pentium M | 1862 | 32 | "gcc 3.4.4" -O2 -fomit-frame-pointer
| 11381 | 5286 | 429 | 1824 | |
|
| UltraSPARC 3 | 1593 | 64 | "gcc 3.4.4" -O2 -mcpu=ultrasparc
| 10597 | 5349 | 368 | 1665 | |
|
| HPPA 8800 | 800 | 64 | "cc B.11.X.32509-32512.GP" +DD64 +O2
| 9466 | 3631 | 385 | 1503 | |
|
| Atom | 1600 | 64 | "gcc 4.2.1" -O2 -m64 -mtune=k8
| 6737 | 4465 | 320 | 1325 | 828 |
|
| PowerPC 7447 (G4) | 1420 | 32 | "gcc 4.1.0" -O2 -mpowerpc -mcpu=7450
| 6080 | 3479 | 247 | 1066 | |
|
| Alpha 21164A | 600 | 64 | "gcc 4.1.2" -O3 -mcpu=ev56
| 3964 | 2122 | 179 | 721 | |
|
GMP 4.1.x results
| CPU
|
|
| Compiler/Compilation flags
|
|
| GMPbench
|
|
| Opteron/Athlon64 | 2400 | 64 | "gcc 3.4.2" -O2 -mcpu=nocona -funroll-loops (NB! no asm code)
| 27321 | 18280 | 1441 | 5675 |
|
| PowerPC 970 (G5) | 2500 | 64 | "gcc 3.4" -O3
| 20324 | 12874 | 1110 | 4238 |
|
| Opteron/Athlon64 | 2400 | 32 | "gcc 3.3.3" -O2 -fomit-frame-pointer (NB! 32-bit only)
| 19127 | 9823 | 802 | 3316 |
|
| Alpha 21264 | 1000 | 64 | "gcc 2.9-gnupro-99r1" -O2
| 16813 | 10706 | 782 | 3240 |
|
| Pentium 4 | 3200 | 64 | "gcc 4.0.2" -O2 -m64 -mtune=k8 (NB! No asm code)
| 15613 | 9186 | 814 | 3122 |
|
| Itanium 2 | 1600 | 64 | "gcc 3.4.3" -O2 (NB! Low-quality asm code)
| 17046 | 9027 | 749 | 3047 |
|
| Athlon XP | 2083 | 32 | "gcc 3.3.2" -O2 -fomit-frame-pointer
| 14076 | 7731 | 616 | 2535 |
|
| Pentium 4 Northwood | 2800 | 32 | "gcc 3.3.2" -O2 -fomit-frame-pointer -march=pentium4
| 13013 | 5770 | 586 | 2253 |
|
| Pentium 4 Prescott | 3000 | 32 | "gcc 3.3.2" -O2 -fomit-frame-pointer -march=pentium4
| 13348 | 5393 | 574 | 2206 |
|
| POWER 4 | 1100 | 64 | "gcc 3.2.1" -O2 -maix64 -mpowerpc64 -mtune=power3
| 8951 | 5920 | 478 | 1863 |
|
| Pentium 3 / Pentium M | 1862 | 32 | "gcc 3.4.4" -O2 -fomit-frame-pointer
| 8125 | 4712 | 393 | 1560 |
|
| HPPA 8800 | 800 | 64 | "cc B.11.11.30766" +DD64 +O2
| 9040 | 3724 | 362 | 1450 |
|
| UltraSPARC 3 | 1336 | 64 | "gcc 3.4.4" -O2 -m64 -mptr64 -mcpu=v9
| 6111 | 3645 | 265 | 1119 |
|
| MIPS R14000 | 500 | 64 | cc 7.3.0
| 5284 | 2819 | 241 | 964 |
|
| PowerPC 74x7 (G4) | 1000 | 32 | "gcc 3.3.3" -O2 -mpowerpc
| 3453 | 2203 | 165 | 676 |
|
| POWER 3 | 475 | 64 | "gcc 2.9-aix51-020209" -maix64 -mpowerpc64 -O2
| 3647 | 2259 | 157 | 671 |
|
| Alpha 21164A | 600 | 64 | "gcc 3.2.1" -O2
| 3514 | 2185 | 158 | 663 |
|
| VIA C3 Nehemia | 1000 | 32 | "gcc 3.4.2" -O2 -fomit-frame-pointer -march=c3-2
| 2378 | 1314 | 111 | 442 |
|
| UltraSPARC 2i | 400 | 64 | "gcc 3.2.2" -O2 -mcpu=ultrasparc
| 1971 | 900 | 89 | 343 |
|
Notes:
- There was no assembly loop support for Opteron/Athlon64 in GMP 4.1.4. We
therefore include two results above for Opteron, 32-bit results using the
Athlon32 assembly loops, and 64-bit results using plain C with inline assembly.
- The performance for the Pentium 4 EM64T processors disappoints. There are
performance problems with many instructions GMP depends on: 64-bit multiply,
integer right shift, and conditional moves and set-on-condition instructions
all need around 10 cycles, and are not fully pipelined. See also this report:
http://swox.com/doc/x86-timing.pdf
- The 32-bit 90nm Pentium 4 processors (Prescott) run GMP applications
slower than older Pentium 4 processors. The reason is that Prescott has longer
latencies for SSE2 instructions and memory loads.
- UltraSPARC 3's poor scores are a result of its inadequate integer
multiply support (shortcomings in both ISA and implementation).
Please send comments about this page to
gmp-discuss@gmplib.org
Copyright 2000, 2001, 2002, 2003,
2004, 2005, 2006, 2007, 2008, 2009 Free Software Foundation
Verbatim
copying and distribution of this entire article is permitted in any medium,
provided this notice is preserved.