Rethinking GMP's configure
tg at gmplib.org
Tue Dec 6 10:18:38 CET 2011
nisse at lysator.liu.se (Niels Möller) writes:
Makes sense to me. Either --with-cpu=ALL or --with-cpu=HOST would be a
My thinking is to make --with-cpu=ALL become the default.
> First, let's admit there will be some overhead. At some level, there
> will be a need to jump through a table, initiated for the run-time CPU.
> This indirection costs a few cycles each time. But note that in a
> shared library, we typically access data through a GOT (global offset
> table) and call functions indirectly through a PLT (procedure linkage
> table). I think we could stay within the overhead of shared library
> calls, if we do things right. (For ELF, that is, where we can control
> such things.)
I think I suggested, some time ago, that one could overwrite the table
entries in the PLT, with no *additional* indirections. But that may not
work. If I understand things correctly, there's no *the* PLT, there's a
PLT in each loadable object (executable or shared library file). And
then I have already forgotten how that is supposed to work with function
Ian Lance Taylor has written some excellent texts about all this. (They
should be next to your search engine.)
This is a messy area, even if one ignores the worls outside of ELF.
(The PLT is, as you say, part of the process, and not to be manipulated
by the shared library. Perhaps it is *possible* to manipulate it using
some dll calls on *some* systems, but I don't think this would be the
But for gmp-internal calls, we can jump through the table in the same
way for both static and dynamic linking; the PLT need not be involved.
That makes sense.
But there are some requirements from the C standard that gets into the
way of sense here. It is for example required that foo == bar holds if
foo and bar are some combination of pointers or symbol names that end up
at the same function when called.
> To decrease overhead, perhaps mainly in the static library, code
> selection can be made not at the calls to the most primitive functions,
> but a bit "higher"; we need to compile such functions several times with
> fixed primitive functions (i.e., with calls to these going directly, not
> through a jump table).
Which functions are candidates for this treatment?
* The toom functions, at least several of them.
* mullo_basecase, mulmid_basecase, mul_basecase, sqr_basecase (the
latter two come as assembly for x86, but I have bigger plans than x86).
* redc_1.c, redc_2.c.
* Similar-level functions
> 2. When using an optional assembly primitive, use a run-time test if the
> optional primitive is provided by *any* but not all configured CPUs,
> exclude it if it is never available, and include it without a
> run-time test if it is always available.
Sounds like the most invasive part of this change.
Perhaps. I think the many #ifdefs in the toom code will need to be
changed into some macro form. I suspect the toom code might actually
become more legible...
More information about the gmp-devel