C implementation of mod_1_1
Torbjorn Granlund
tg at gmplib.org
Wed Mar 2 14:27:50 CET 2011
nisse at lysator.liu.se (Niels Möller) writes:
Torbjorn Granlund <tg at gmplib.org> writes:
> Your mod_1 family improvements have moved MOD_1N_TO_MOD_1_1_THRESHOLD
> and MOD_1U_TO_MOD_1_1_THRESHOLD down by a few notches, making the
> average values 4 and 3, respectively.
Nice, but which changes do you think do this? I've hacked on the x86 and
x86_64 assembler, and the udiv_(q?)rnnd_preinv macros.
I suspect it was perhaps mainly your cps improvements for x86 and
x86_64.
The new algorithm in mod_1_1.c is, as far as I'm aware, not enabled on
anything (MOD_1_1P_METHOD always 1).
OK.
You're measuring MOD_1_1P_METHOD even when mod_1_1 is natively in
assembly. But it surely will be ignored then, even if inserted in some
gmp-mparam.h?
Perhaps we should suppress the measuring, or at least avoid putting
ignored parameters in the gmp-mparam.h files?
BTW, the mod_1_1 tuning needs some further updates. I think it should be
like this:
* Determine the best MOD_1_1P_METHOD (done by tuneup, but then never
used for anything). Currently uses measurements on 10-limb inputs.
* Choose which mod_1_1p should be used. Always use the native version if
it exists (but maybe measure it and display a warning if it's slower
than the method 1 or method 2 C implementations code). If there's no
native implementation, select the best of method 1 and method 2.
Record the selection as a function pointer.
Of record it as MOD_1_1P_METHOD and have 'if (MOD_1_1P_METHOD == 1) blah
else blah' in ode for further tuning.
Function pointers are not ubiquitously branch predicted, and therefore
may cost a full pipeline delay. And if-else statement like above will
100% predicted.
This matters for measuring mod_1_1 to mod_1_2 (and if mod_1_2 is not to
be used, mod_1_1 to mod_1_4); mod_1_1 called via a function pointer
would look sloer during meaurements than in the real library.
--
Torbjörn
More information about the gmp-devel
mailing list