| GMPbench 0.1 results
|
GMPbench 0.1 results. We have run the benchmark on the highest-frequency
machine of each type to which we have access. Scaling to lower or higher
frequencies should work well, since GMP mainly work off the caches.
The benchmark suite is available for
download. To run the benchmarks, you also
need to compile gexpr.c and put it somewhere in your
path.
GMP is written mainly in C, but most processors have a few critical loops
written in assembly. The quality of the assembly code is generally very high,
but some processors have gotten more attention than others, resulting in better
performance.
GMP exercises a particular set of processor capabilities, widening integer
multiplication being the most important one. Processors with poor integer
multiply support get worse scores on GMPbench than on other benchmarks.
GMP 4.2.x results
| CPU
|
|
| Compiler/Compilation flags
|
|
| GMPbench
|
|
| Opteron/Athlon64 | 2600 | 64 | "gcc 4.1.2" -O2 -m64 -mtune=k8
| 45855 | 28136 | 2252 | 8995 | 17000 @ 3.2GHz
|
| Core 2 | 3000 | 64 | "gcc 4.1.2" -O2 -m64 -mtune=k8
| 36902 | 20330 | 2092 | 7570 | 12000 @ 3.2GHz
|
| PowerPC 970 (G5) | 2700 | 64 | "gcc 4.0.1 build 5250" -mcpu=970 -fast
| 27740 | 16500 | 1409 | 5490 | 7500 @ 2.7GHz
|
| Alpha 21264 | 1000 | 64 | "gcc 4.1.2" -O3 -mcpu=ev67
| 18703 | 11272 | 913 | 3641 | 6000 @ 1.25GHz
|
| Pentium 4 | 3200 | 64 | "gcc 4.1.2" -O2 -m64 -mtune=k8
| 18778 | 10689 | 906 | 3583 | 5000 @ 3.8GHz
|
| Itanium 2 | 1600 | 64 | "gcc 4.1.1" -O3 -mtune=itanium2
| 19744 | 10340 | 799 | 3379 | 13000 @ 1.6GHz
|
| Athlon XP | 2083 | 32 | "gcc 4.0.2" -O2 -fomit-frame-pointer
| 15682 | 7902 | 624 | 2636 |
|
| Pentium 4 Prescott | 3000 | 32 | "gcc 4.0.2" -O2 -fomit-frame-pointer -march=pentium4
| 15123 | 6189 | 675 | 2556 | 4000 @ 3.8GHz
|
| Pentium 4 Northwood | 2800 | 32 | "gcc 3.4.4" -O2 -fomit-frame-pointer -march=pentium4
| 15277 | 5899 | 618 | 2422 | 3500 @ 3.4GHz
|
| Pentium 3 / Pentium M | 1862 | 32 | "gcc 3.4.4" -O2 -fomit-frame-pointer
| 11381 | 5286 | 429 | 1824 |
|
| UltraSPARC 3 | 1593 | 64 | "gcc 3.4.4" -O2 -mcpu=ultrasparc
| 10597 | 5349 | 368 | 1665 |
|
| HPPA 8800 | 800 | 64 | "cc B.11.X.32509-32512.GP" +DD64 +O2
| 9466 | 3631 | 385 | 1503 |
|
| PowerPC 7447 (G4) | 1420 | 32 | "gcc 4.1.0" -O2 -mpowerpc -mcpu=7450
| 6080 | 3479 | 247 | 1066 |
|
| Alpha 21164A | 600 | 64 | "gcc 4.1.2" -O3 -mcpu=ev56
| 3964 | 2122 | 179 | 721 |
|
GMP 4.1.x results
| CPU
|
|
| Compiler/Compilation flags
|
|
| GMPbench
|
|
| Opteron/Athlon64 | 2400 | 64 | "gcc 3.4.2" -O2 -mcpu=nocona -funroll-loops (NB! no asm code)
| 27321 | 18280 | 1441 | 5675 |
|
| PowerPC 970 (G5) | 2500 | 64 | "gcc 3.4" -O3
| 20324 | 12874 | 1110 | 4238 |
|
| Opteron/Athlon64 | 2400 | 32 | "gcc 3.3.3" -O2 -fomit-frame-pointer (NB! 32-bit only)
| 19127 | 9823 | 802 | 3316 |
|
| Alpha 21264 | 1000 | 64 | "gcc 2.9-gnupro-99r1" -O2
| 16813 | 10706 | 782 | 3240 |
|
| Pentium 4 | 3200 | 64 | "gcc 4.0.2" -O2 -m64 -mtune=k8 (NB! No asm code)
| 15613 | 9186 | 814 | 3122 |
|
| Itanium 2 | 1600 | 64 | "gcc 3.4.3" -O2 (NB! Low-quality asm code)
| 17046 | 9027 | 749 | 3047 |
|
| Athlon XP | 2083 | 32 | "gcc 3.3.2" -O2 -fomit-frame-pointer
| 14076 | 7731 | 616 | 2535 |
|
| Pentium 4 Northwood | 2800 | 32 | "gcc 3.3.2" -O2 -fomit-frame-pointer -march=pentium4
| 13013 | 5770 | 586 | 2253 |
|
| Pentium 4 Prescott | 3000 | 32 | "gcc 3.3.2" -O2 -fomit-frame-pointer -march=pentium4
| 13348 | 5393 | 574 | 2206 |
|
| POWER 4 | 1100 | 64 | "gcc 3.2.1" -O2 -maix64 -mpowerpc64 -mtune=power3
| 8951 | 5920 | 478 | 1863 |
|
| Pentium 3 / Pentium M | 1862 | 32 | "gcc 3.4.4" -O2 -fomit-frame-pointer
| 8125 | 4712 | 393 | 1560 |
|
| HPPA 8800 | 800 | 64 | "cc B.11.11.30766" +DD64 +O2
| 9040 | 3724 | 362 | 1450 |
|
| UltraSPARC 3 | 1336 | 64 | "gcc 3.4.4" -O2 -m64 -mptr64 -mcpu=v9
| 6111 | 3645 | 265 | 1119 |
|
| MIPS R14000 | 500 | 64 | cc 7.3.0
| 5284 | 2819 | 241 | 964 |
|
| PowerPC 74x7 (G4) | 1000 | 32 | "gcc 3.3.3" -O2 -mpowerpc
| 3453 | 2203 | 165 | 676 |
|
| POWER 3 | 475 | 64 | "gcc 2.9-aix51-020209" -maix64 -mpowerpc64 -O2
| 3647 | 2259 | 157 | 671 |
|
| Alpha 21164A | 600 | 64 | "gcc 3.2.1" -O2
| 3514 | 2185 | 158 | 663 |
|
| VIA C3 Nehemia | 1000 | 32 | "gcc 3.4.2" -O2 -fomit-frame-pointer -march=c3-2
| 2378 | 1314 | 111 | 442 |
|
| UltraSPARC 2i | 400 | 64 | "gcc 3.2.2" -O2 -mcpu=ultrasparc
| 1971 | 900 | 89 | 343 |
|
Notes:
- The last column, "Optimal", is an estimate of what could be attained by
writing optimized assembly code for this processor.
- There was no assembly loop support for Opteron/Athlon64 in GMP 4.1.4. We
therefore include two results above for Opteron, 32-bit results using the
Athlon32 assembly loops, and 64-bit results using plain C with inline assembly.
- The performance for the Pentium 4 EM64T processors disappoints. There are
performance problems with many instructions GMP depends on: 64-bit multiply,
integer right shift, and conditional moves and set-on-condition instructions
all need around 10 cycles, and are not fully pipelined. See also this report:
http://swox.com/doc/x86-timing.pdf
- The 32-bit 90nm Pentium 4 processors (Prescott) run GMP applications
slower than older Pentium 4 processors. The reason is that Prescott has longer
latencies for SSE2 instructions and memory loads.
- UltraSPARC 3's disappointing scores are a result of its poor integer
multiply support (unsuitable architectural support as well as non-pipelined
integer multiply implementation).
Please send comments about this page to gmp-discuss@swox.com
Copyright 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008 Free Software
Foundation
Verbatim copying and distribution of this entire article is permitted in any
medium, provided this notice is preserved.