GMPBench for Sun Fire T2000
Greg Childers
jgchilders at uk-alumni.org
Fri Apr 28 21:34:58 CEST 2006
At 03:19 AM 4/28/2006, Torbjorn Granlund wrote:
>But these results look just like the results you had without
>any assembly code. I am more intersted in reslts with assembly
>code.
But the original cross-compile I used with the stock gmp-4.2 source
did use the sparc64 assembly code as listed in the message
http://gmplib.org/list-archives/gmp-devel/2006-April/000625.html
So those numbers are _using_ the assembly code.
I did a 64-bit compile using the generic code just to see what the
result would be, and ended up with a surprise! Here are the results:
generic gcc:
***** GMPbench version 0.1 *****
Using default CFLAGS = "-O3 -fomit-frame-pointer -m64 -mptr64
-mcpu=ultrasparc -I. -L."
Using default CC = "gcc"
Using default LIBS = "-lgmp"
Using compilation command: gcc -O3 -fomit-frame-pointer -m64 -mptr64
-mcpu=ultrasparc -I. -L. foo.c -o foo -lgmp
You may want to override CC, CFLAGS, and LIBS
Using gmp version: 4.2
Compiling benchmarks
Running benchmarks
Category base
Program multiply
multiply 128 128
GMPbench.base.multiply.128,128 result: 1838137
multiply 512 512
GMPbench.base.multiply.512,512 result: 182501
multiply 8192 8192
GMPbench.base.multiply.8192,8192 result: 1590
multiply 131072 131072
GMPbench.base.multiply.131072,131072 result: 24.5
multiply 2097152 2097152
GMPbench.base.multiply.2097152,2097152 result: 1.02
GMPbench.base.multiply result: 1678.7
Program divide
divide 8192 32
GMPbench.base.divide.8192,32 result: 50735
divide 8192 64
GMPbench.base.divide.8192,64 result: 52555
divide 8192 128
GMPbench.base.divide.8192,128 result: 30307
divide 8192 4096
GMPbench.base.divide.8192,4096 result: 2974
divide 8192 8064
GMPbench.base.divide.8192,8064 result: 44948
divide 131072 8192
GMPbench.base.divide.131072,8192 result: 61.6
divide 131072 65536
GMPbench.base.divide.131072,65536 result: 30.2
divide 8388608 4194304
GMPbench.base.divide.8388608,4194304 result: 0.0947
GMPbench.base.divide result: 1083.8
GMPbench.base result: 1348.8
Category app
Program rsa
rsa 512
GMPbench.app.rsa.512 result: 549
rsa 1024
GMPbench.app.rsa.1024 result: 85.7
rsa 2048
GMPbench.app.rsa.2048 result: 12.1
GMPbench.app.rsa result: 82.879
GMPbench.app result: 82.879
GMPbench result: 334.35
generic cc:
***** GMPbench version 0.1 *****
Using default CFLAGS = "-xO2 -fast -xarch=v9 -L."
Using default CC = "cc"
Using default LIBS = "-dn -lgmp -dy"
Using compilation command: cc -xO2 -fast -xarch=v9 -L. foo.c -o foo
-dn -lgmp -dy
You may want to override CC, CFLAGS, and LIBS
Using gmp version: 4.2
Compiling benchmarks
Running benchmarks
Category base
Program multiply
multiply 128 128
GMPbench.base.multiply.128,128 result: 1766693
multiply 512 512
GMPbench.base.multiply.512,512 result: 169259
multiply 8192 8192
GMPbench.base.multiply.8192,8192 result: 1477
multiply 131072 131072
GMPbench.base.multiply.131072,131072 result: 22.3
multiply 2097152 2097152
GMPbench.base.multiply.2097152,2097152 result: 0.926
GMPbench.base.multiply result: 1556
Program divide
divide 8192 32
GMPbench.base.divide.8192,32 result: 49503
divide 8192 64
GMPbench.base.divide.8192,64 result: 49867
divide 8192 128
GMPbench.base.divide.8192,128 result: 29040
divide 8192 4096
GMPbench.base.divide.8192,4096 result: 2775
divide 8192 8064
GMPbench.base.divide.8192,8064 result: 41022
divide 131072 8192
GMPbench.base.divide.131072,8192 result: 57.2
divide 131072 65536
GMPbench.base.divide.131072,65536 result: 27.9
divide 8388608 4194304
GMPbench.base.divide.8388608,4194304 result: 0.0866
GMPbench.base.divide result: 1015.1
GMPbench.base result: 1256.8
Category app
Program rsa
rsa 512
GMPbench.app.rsa.512 result: 511
rsa 1024
GMPbench.app.rsa.1024 result: 78.6
rsa 2048
GMPbench.app.rsa.2048 result: 11.1
GMPbench.app.rsa result: 76.393
GMPbench.app result: 76.393
GMPbench result: 309.86
The generic code is over 5x faster than the assembly! I guess the T1
is very different from the earlier UltraSparcs. With 32 virtual
processors, perhaps this processor does have potential as a computing
engine with the proper assembly! That's assuming all integer
operations, tho, since there is only 1 FPU for each 4 virtual processors.
Greg
More information about the gmp-devel
mailing list