GMPBench for Sun Fire T2000
Torbjorn Granlund
tege at swox.com
Sat Apr 29 00:45:27 CEST 2006
Greg Childers <jgchilders at uk-alumni.org> writes:
At 03:19 AM 4/28/2006, Torbjorn Granlund wrote:
>But these results look just like the results you had without
>any assembly code. I am more intersted in reslts with assembly
>code.
But the original cross-compile I used with the stock gmp-4.2 source
did use the sparc64 assembly code as listed in the message
http://gmplib.org/list-archives/gmp-devel/2006-April/000625.html
So those numbers are _using_ the assembly code.
Oh, sorry.
Now I get it, the sparc64 assembly code uses floating-point arithmetic
for the very basic multiply operations, mpn_mul_1, mpn_addmul_1, and
mpn_submul_1. This is done as a workaround for sparcv9's unique lack
of proper integer multiply support.
But floating-point isn't exactly the T1's strongest side.
GMPbench result: 334.35
Even without that assembly, GMPbench scores are pretty awful, and I
don't think we're going to be able to beat that by much with some T1
friendly assembly.
Clearly we should disable the current assembly for ultraspact1-*-*.
With 32 virtual processors, perhaps this processor does have
potential as a computing engine with the proper assembly!
On the GMPbench results page, we say:
UltraSPARC 3's terrible scores are a result of its uniquely poor
integer multiply support (unsuitable architectural support +
simplistic integer multiply implementation).
The same is very true for ultraspact1.
If somebody feels like proving me wrong, write a fast 64x64->128 bit
multiply routine. On Athlon64 there is an instruction for that, and
for GMP's needs it runs in 3 cycles. All non-sparc machines have such
instructions, with performance varying from 2 cycles to about 10
cycles.
I doubt it will be possible to synthesise that operation in much less
than 40 cycles on the T1.
--
Torbjörn
More information about the gmp-devel
mailing list