thresholds
Kevin Ryde
user42 at zip.com.au
Fri Apr 30 02:21:03 CEST 2004
David Newman <david.newman at jesus.ox.ac.uk> writes:
>
> - gcc 3.4 with "-O2 -march=athlon-xp -fomit-frame-pointer"
> - gcc 3.4 with "-O3 -march=athlon-xp -fomit-frame-pointer"
I'm not actually aware of -O3 doing much for us that -O2 doesn't. We
try to inline stuff judiciously already, so in particular
-finline-functions shouldn't do much.
> -fprefetch-loop-arrays
Tuning is usually within L1 and should be unaffected by caching.
> -ffast-math
Very little floating point in gmp.
> -fforce-addr
This is CSE of fetches is it? We should use explicit variables to get
this effect always, if we don't already.
> -maccumulate-outgoing-args
This should be good for k7, despite the code bloat. But I'd expect it
would only be noticable on functions that were very short anyway.
> The tests seem to show that having a more recent version of gcc and/or
> more aggressive CFLAGS doesn't necessarily mean you get better
> thresholds.
Many are basically comparisons of different bits of asm code, so
should be unaffected by the cc.
> How
> much work would it take to change the compile process to do a "make
> tune" at compile time, if this is feasible?
The problem is you need a mostly idle machine, and then ought to at
least look to see if the values are sensible. Some of the tuning is a
bit over-sensitive too, or doesn't choose terribly consistently when
there's a range of sizes with very little difference between two algs.
> MUL_TOOM3_THRESHOLD 202 / 174 / 177 / 173 / 177
> SQR_TOOM3_THRESHOLD 226 / 185 / 186 / 182 / 183
That looks a bit different. Your values are within error tolerance of
each other, but not sure what's happened to what we had last measured
though.
> DIV_DC_THRESHOLD 92 / 84 / 88 / 84 / 85
> POWM_THRESHOLD 142 / 128 / 142 / 134 / 128
Very possibly within error limits.
> GCDEXT_THRESHOLD 46 / 26 / 30 / 28 / 14
This one is mostly C code, so may well be affected by the smartness of
the compiler. A lower threshold might indicate better code for the
"div2" routine. (If that's so then a spot of assembler could take it
out of the hands of the compiler, to get a benefit always.)
More information about the gmp-devel
mailing list