Kevin Ryde user42 at
Fri Apr 30 02:21:03 CEST 2004

David Newman <david.newman at> writes:
> - gcc 3.4 with "-O2 -march=athlon-xp -fomit-frame-pointer"
> - gcc 3.4 with "-O3 -march=athlon-xp -fomit-frame-pointer"

I'm not actually aware of -O3 doing much for us that -O2 doesn't.  We
try to inline stuff judiciously already, so in particular
-finline-functions shouldn't do much.

> -fprefetch-loop-arrays

Tuning is usually within L1 and should be unaffected by caching.

> -ffast-math

Very little floating point in gmp.

> -fforce-addr

This is CSE of fetches is it?  We should use explicit variables to get
this effect always, if we don't already.

> -maccumulate-outgoing-args

This should be good for k7, despite the code bloat.  But I'd expect it
would only be noticable on functions that were very short anyway.

> The tests seem to show that having a more recent version of gcc and/or
> more aggressive CFLAGS doesn't necessarily mean you get better
> thresholds.

Many are basically comparisons of different bits of asm code, so
should be unaffected by the cc.

> How
> much work would it take to change the compile process to do a "make
> tune" at compile time, if this is feasible?

The problem is you need a mostly idle machine, and then ought to at
least look to see if the values are sensible.  Some of the tuning is a
bit over-sensitive too, or doesn't choose terribly consistently when
there's a range of sizes with very little difference between two algs.

> MUL_TOOM3_THRESHOLD     202 / 174 / 177 / 173 / 177
> SQR_TOOM3_THRESHOLD     226 / 185 / 186 / 182 / 183

That looks a bit different.  Your values are within error tolerance of
each other, but not sure what's happened to what we had last measured

> DIV_DC_THRESHOLD        92 / 84 / 88 / 84 / 85
> POWM_THRESHOLD          142 / 128 / 142 / 134 / 128

Very possibly within error limits.

> GCDEXT_THRESHOLD        46 / 26 / 30 / 28 / 14

This one is mostly C code, so may well be affected by the smartness of
the compiler.  A lower threshold might indicate better code for the
"div2" routine.  (If that's so then a spot of assembler could take it
out of the hands of the compiler, to get a benefit always.)

More information about the gmp-devel mailing list