xeon 64 bits
Torbjorn Granlund
tege at swox.com
Wed Sep 29 17:27:04 CEST 2004
Emmanuel Thomé <Emmanuel.Thome+gmp at loria.fr> writes:
Has anyone had the occasion to run gmpbench on a xeon 64 bits machine ?
I know there's presumably no assembly, but I would be very, very happy
with a result figure if there exists one. Even using generic mpn code.
Swox will probably get a system in about a month.
We don't expect great performance, though. The "netburst" micro-
architecture that Pentium4 and modern Xeon processors share, don't
like the ADC and SBB instructions. They both have a latency of 8
cycles.
For 32-bit incarnations of the architecture, we circumvent this
ADC/SBB problem by using 64-bit SSE2 arithmetic.
Unfortunately, this is not going to work for Xeon 64, since SSE2
doesn't have any suitable 128-bit arithmetic. We're probably better
off with using plain integer operations, in particular since the
64x64->128 MULQ instructions is highly desirable. To avoid the 8
cycle ADC/SBB carry propagation will probably want the same type of
"majority" logic as we use for sparc64 addition:
s = u + v + cy;
cy = or(and(u,v),and(or(u,v),not(s))) >> 63
This implies a four insn recurrency, and perhaps addmul_1 times of
around 6 cycles/limb.
The AMD K8 code doesn't need such complications, as AMD choose to
implement ADC/SBB well.
P.S: By the way, here's a result for an opteron 250 @ 2.4GHz w/ gmp 4.1.4
compiler : "gcc-3.3.3 -O3 -fomit-frame-pointer"
GMPbench.base.multiply result: 26163
GMPbench.base.divide result: 16504
GMPbench.app.rsa result: 1406.2
GMPbench result: 5405.6
Very good numbers. Did you override the default CFLAGS for the GMP
build? Could I have the full output from your benchmark run, please?
I'd like to publish these numbers on the GMP web site.
--
Torbjörn
More information about the gmp-discuss
mailing list