GMPBench for Sun Fire T2000

Torbjorn Granlund tege at swox.com
Sat Apr 29 00:45:27 CEST 2006


Greg Childers <jgchilders at uk-alumni.org> writes:

  At 03:19 AM 4/28/2006, Torbjorn Granlund wrote:
  >But these results look just like the results you had without
  >any assembly code.  I am more intersted in reslts with assembly
  >code.
  
  But the original cross-compile I used with the stock gmp-4.2 source 
  did use the sparc64 assembly code as listed in the message
  http://gmplib.org/list-archives/gmp-devel/2006-April/000625.html
  So those numbers are _using_ the assembly code.
  
Oh, sorry.

Now I get it, the sparc64 assembly code uses floating-point arithmetic
for the very basic multiply operations, mpn_mul_1, mpn_addmul_1, and
mpn_submul_1.  This is done as a workaround for sparcv9's unique lack
of proper integer multiply support.

But floating-point isn't exactly the T1's strongest side.

  GMPbench result: 334.35
  
Even without that assembly, GMPbench scores are pretty awful, and I
don't think we're going to be able to beat that by much with some T1
friendly assembly.

Clearly we should disable the current assembly for ultraspact1-*-*.
  
  With 32 virtual processors, perhaps this processor does have
  potential as a computing engine with the proper assembly!

On the GMPbench results page, we say:

UltraSPARC 3's terrible scores are a result of its uniquely poor
integer multiply support (unsuitable architectural support +
simplistic integer multiply implementation).

The same is very true for ultraspact1.

If somebody feels like proving me wrong, write a fast 64x64->128 bit
multiply routine.  On Athlon64 there is an instruction for that, and
for GMP's needs it runs in 3 cycles.  All non-sparc machines have such
instructions, with performance varying from 2 cycles to about 10
cycles.

I doubt it will be possible to synthesise that operation in much less
than 40 cycles on the T1.

-- 
Torbjörn


More information about the gmp-devel mailing list