gmp_pi1_t

Mon Dec 14 08:21:55 CET 2009

Hi, I indeed missed the pointer-part, anyway:

* Niels Möller <nisse at lysator.liu.se> [Dec 14. 2009 17:53]:
> Joerg Arndt <arndt at jjj.de> writes:
> 
> > The 'inline' keyword is your friend.
> > Inlined functions are optimized across their
> > boundaries which is often more important than
> > just avoiding call overhead.
> 
> Inline is used in some places in gmp. But I think most of the functions
> that take a struct gmp_pi1_t * argument are too large to be usefully
> inlined.

With inlining _nothing_ has to be passed which can be an advantage.
Also saving and restoring of registers may be omitted
(optimization across boundaries).
Possibly a few other things as well, people like
Richard Guenther would now (isn't he on this list?).

> 
> > The compiler should remove 'fall through' functions entirely.
> > IIRC there is some optimization switch regarding this issue.
> 
> I'm not sure what you mean. In this case, the difference is between
> 
>   struct gmp_pi1_t { mp_limb_t inv32 } inv;
>   foo (... &inv);
> 
> and
> 
>   mp_limb_t inv;
>   foo (..., inv);
> 
> Last time I looked (which was several years ago), if you use the
> address-of operator & on a local variable anywhere in a function, gcc
> never ever allocates that variable in a register.

This might be forbidden by the C standard, not sure.

> So this is a small
> performance penalty for the "abstracted" version of the code.
> 

Whether inlining is good really depends on several things.
There is some standard rule the code up to so and so
many machine intructions should be inlined (30 instr.?),
see the CPU optimization manuals.

However, if the function is used in very many places,
code bloat may occur.

Also, if the function is rarely used (few times, not places),
the call overhead is nothing to worry about.

If a function is using just in one place, you should
always inline it, regardless of size (well, only if
it is critical for performance).

> In principle, if foo is inlined, the compiler could maybe optimize away
> the address-of operation. gcc probably does? But does it redo register
> allocation at that point, so that the struct can be put in a register?
> 

gcc is free to do a lot of things (as long as it sticks to the standard).
I recommend to inspect the generated machine code for performance-critical
code sections.

> In gmp, we don't pass structs by value, and we don't return structs. I
> imagine there are portability reasons for that, possibly historic. If we
> would pass the struct above by value rather than by reference, then the
> compiler should be able to optimize away everything related to the
> struct container.

Yes.

> But I don't think that's a good alternative in this
> case, since the point of gmp_pi1_t is to make it possible to add various
> other members without changing the interface,

This could be a valid argument for staying away from certain
optimizations (those possible obscuring things and making
maintenance hard).

> and I don't think we want
> to pass large structs by value.
> 

Right.

> Regards,
> /Niels
> 
> -- 
> Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
> Internet email is subject to wholesale government surveillance.

cheers,  jj