C implementation of mod_1_1
tg at gmplib.org
Wed Mar 2 14:27:50 CET 2011
nisse at lysator.liu.se (Niels Möller) writes:
Torbjorn Granlund <tg at gmplib.org> writes:
> Your mod_1 family improvements have moved MOD_1N_TO_MOD_1_1_THRESHOLD
> and MOD_1U_TO_MOD_1_1_THRESHOLD down by a few notches, making the
> average values 4 and 3, respectively.
Nice, but which changes do you think do this? I've hacked on the x86 and
x86_64 assembler, and the udiv_(q?)rnnd_preinv macros.
I suspect it was perhaps mainly your cps improvements for x86 and
The new algorithm in mod_1_1.c is, as far as I'm aware, not enabled on
anything (MOD_1_1P_METHOD always 1).
You're measuring MOD_1_1P_METHOD even when mod_1_1 is natively in
assembly. But it surely will be ignored then, even if inserted in some
Perhaps we should suppress the measuring, or at least avoid putting
ignored parameters in the gmp-mparam.h files?
BTW, the mod_1_1 tuning needs some further updates. I think it should be
* Determine the best MOD_1_1P_METHOD (done by tuneup, but then never
used for anything). Currently uses measurements on 10-limb inputs.
* Choose which mod_1_1p should be used. Always use the native version if
it exists (but maybe measure it and display a warning if it's slower
than the method 1 or method 2 C implementations code). If there's no
native implementation, select the best of method 1 and method 2.
Record the selection as a function pointer.
Of record it as MOD_1_1P_METHOD and have 'if (MOD_1_1P_METHOD == 1) blah
else blah' in ode for further tuning.
Function pointers are not ubiquitously branch predicted, and therefore
may cost a full pipeline delay. And if-else statement like above will
This matters for measuring mod_1_1 to mod_1_2 (and if mod_1_2 is not to
be used, mod_1_1 to mod_1_4); mod_1_1 called via a function pointer
would look sloer during meaurements than in the real library.
More information about the gmp-devel