proposal for enhancement to configure, re UltraSPARC-T1

Mon Dec 11 11:15:12 CET 2006

Peter Farkas <Peter.Farkas at Sun.COM> writes:

> Since usage of the floating point registers is not recommended on
> UltraSPARC-T1 based systems,

Why is it not recommended? Are floating operations in general very
slow on this processor?

> Using the generic integer functions in GMP yields about five times
> better performance than using the .asm functions on UltraSPARC-T1
> based systems (i.e. an improvement of about 400%).

When you say "generic integer functions", does that boil down to using
the sparc v9 instruction mulx? (Which takes two 64 bit integers and
produces the least significant 64 bits of their product).

The problem with mulx when you need *all* bits of the product (which
you almost always do in GMP), then you have to restrict the arguments
to 32 bits. To get a full 128 bit product, you have to use mulx four
times shifting around the different halves of the inputs (or three
mulx, theoretically, but with even more overhead).

As far as I understand the GMP sparc issues, that's the reason for the
somewhat complicated floating-point using code; it was the only way to
get a serious performance gain over the older 32-bit code. This is in
contrast to other architectures, where you typically get a four time
speedup from using the more powerful 64-bit integer multiplication.

Have you tried any benchmarks comparing 32-bit and 64-bit bignum
performance on the T1, and on other ultrasparcsq?

Does the T1, or other recent ultrasparcs, support any other
instructions that can do a full multiplication (two 64 bit inputs, one
128 bit output) with reasonable performance?

Regards,
/Niels