GMPBench for Sun Fire T2000

Fri Apr 28 21:34:58 CEST 2006

At 03:19 AM 4/28/2006, Torbjorn Granlund wrote:
>But these results look just like the results you had without
>any assembly code.  I am more intersted in reslts with assembly
>code.

But the original cross-compile I used with the stock gmp-4.2 source 
did use the sparc64 assembly code as listed in the message
http://gmplib.org/list-archives/gmp-devel/2006-April/000625.html
So those numbers are _using_ the assembly code.

I did a 64-bit compile using the generic code just to see what the 
result would be, and ended up with a surprise!  Here are the results:

generic gcc:

***** GMPbench version 0.1 *****
Using default CFLAGS = "-O3 -fomit-frame-pointer -m64 -mptr64 
-mcpu=ultrasparc -I. -L."
Using default CC = "gcc"
Using default LIBS = "-lgmp"
Using compilation command: gcc -O3 -fomit-frame-pointer -m64 -mptr64 
-mcpu=ultrasparc -I. -L. foo.c -o foo -lgmp
You may want to override CC, CFLAGS, and LIBS
Using gmp version: 4.2
Compiling benchmarks
Running benchmarks
   Category base
     Program multiply
       multiply 128 128
       GMPbench.base.multiply.128,128 result: 1838137
       multiply 512 512
       GMPbench.base.multiply.512,512 result: 182501
       multiply 8192 8192
       GMPbench.base.multiply.8192,8192 result: 1590
       multiply 131072 131072
       GMPbench.base.multiply.131072,131072 result: 24.5
       multiply 2097152 2097152
       GMPbench.base.multiply.2097152,2097152 result: 1.02
     GMPbench.base.multiply result: 1678.7
     Program divide
       divide 8192 32
       GMPbench.base.divide.8192,32 result: 50735
       divide 8192 64
       GMPbench.base.divide.8192,64 result: 52555
       divide 8192 128
       GMPbench.base.divide.8192,128 result: 30307
       divide 8192 4096
       GMPbench.base.divide.8192,4096 result: 2974
       divide 8192 8064
       GMPbench.base.divide.8192,8064 result: 44948
       divide 131072 8192
       GMPbench.base.divide.131072,8192 result: 61.6
       divide 131072 65536
       GMPbench.base.divide.131072,65536 result: 30.2
       divide 8388608 4194304
       GMPbench.base.divide.8388608,4194304 result: 0.0947
     GMPbench.base.divide result: 1083.8
   GMPbench.base result: 1348.8
   Category app
     Program rsa
       rsa 512
       GMPbench.app.rsa.512 result: 549
       rsa 1024
       GMPbench.app.rsa.1024 result: 85.7
       rsa 2048
       GMPbench.app.rsa.2048 result: 12.1
     GMPbench.app.rsa result: 82.879
   GMPbench.app result: 82.879
GMPbench result: 334.35

generic cc:
***** GMPbench version 0.1 *****
Using default CFLAGS = "-xO2 -fast -xarch=v9 -L."
Using default CC = "cc"
Using default LIBS = "-dn -lgmp -dy"
Using compilation command: cc -xO2 -fast -xarch=v9 -L. foo.c -o foo 
-dn -lgmp -dy
You may want to override CC, CFLAGS, and LIBS
Using gmp version: 4.2
Compiling benchmarks
Running benchmarks
   Category base
     Program multiply
       multiply 128 128
       GMPbench.base.multiply.128,128 result: 1766693
       multiply 512 512
       GMPbench.base.multiply.512,512 result: 169259
       multiply 8192 8192
       GMPbench.base.multiply.8192,8192 result: 1477
       multiply 131072 131072
       GMPbench.base.multiply.131072,131072 result: 22.3
       multiply 2097152 2097152
       GMPbench.base.multiply.2097152,2097152 result: 0.926
     GMPbench.base.multiply result: 1556
     Program divide
       divide 8192 32
       GMPbench.base.divide.8192,32 result: 49503
       divide 8192 64
       GMPbench.base.divide.8192,64 result: 49867
       divide 8192 128
       GMPbench.base.divide.8192,128 result: 29040
       divide 8192 4096
       GMPbench.base.divide.8192,4096 result: 2775
       divide 8192 8064
       GMPbench.base.divide.8192,8064 result: 41022
       divide 131072 8192
       GMPbench.base.divide.131072,8192 result: 57.2
       divide 131072 65536
       GMPbench.base.divide.131072,65536 result: 27.9
       divide 8388608 4194304
       GMPbench.base.divide.8388608,4194304 result: 0.0866
     GMPbench.base.divide result: 1015.1
   GMPbench.base result: 1256.8
   Category app
     Program rsa
       rsa 512
       GMPbench.app.rsa.512 result: 511
       rsa 1024
       GMPbench.app.rsa.1024 result: 78.6
       rsa 2048
       GMPbench.app.rsa.2048 result: 11.1
     GMPbench.app.rsa result: 76.393
   GMPbench.app result: 76.393
GMPbench result: 309.86

The generic code is over 5x faster than the assembly!  I guess the T1 
is very different from the earlier UltraSparcs.  With 32 virtual 
processors, perhaps this processor does have potential as a computing 
engine with the proper assembly!  That's assuming all integer 
operations, tho, since there is only 1 FPU for each 4 virtual processors.

Greg