xeon 64 bits

Torbjorn Granlund tege at swox.com
Wed Sep 29 17:27:04 CEST 2004


Emmanuel Thomé <Emmanuel.Thome+gmp at loria.fr> writes:

  Has anyone had the occasion to run gmpbench on a xeon 64 bits machine ?
  I know there's presumably no assembly, but I would be very, very happy
  with a result figure if there exists one. Even using generic mpn code. 

Swox will probably get a system in about a month.

We don't expect great performance, though.  The "netburst" micro-
architecture that Pentium4 and modern Xeon processors share, don't
like the ADC and SBB instructions.  They both have a latency of 8
cycles.

For 32-bit incarnations of the architecture, we circumvent this
ADC/SBB problem by using 64-bit SSE2 arithmetic.

Unfortunately, this is not going to work for Xeon 64, since SSE2
doesn't have any suitable 128-bit arithmetic.  We're probably better
off with using plain integer operations, in particular since the
64x64->128 MULQ instructions is highly desirable.  To avoid the 8
cycle ADC/SBB carry propagation will probably want the same type of
"majority" logic as we use for sparc64 addition:

  s = u + v + cy;
  cy = or(and(u,v),and(or(u,v),not(s))) >> 63

This implies a four insn recurrency, and perhaps addmul_1 times of
around 6 cycles/limb.

The AMD K8 code doesn't need such complications, as AMD choose to
implement ADC/SBB well.
  
  P.S: By the way, here's a result for an opteron 250 @ 2.4GHz w/ gmp 4.1.4
  
  compiler : "gcc-3.3.3 -O3 -fomit-frame-pointer" 
  
  GMPbench.base.multiply result: 26163
  GMPbench.base.divide result: 16504
  GMPbench.app.rsa result: 1406.2
  GMPbench result: 5405.6

Very good numbers.  Did you override the default CFLAGS for the GMP
build?  Could I have the full output from your benchmark run, please?
I'd like to publish these numbers on the GMP web site.

-- 
Torbjörn


More information about the gmp-discuss mailing list