mpn_perfpow: do we need a special GCD for mp_bitcnt_t?
Torbjörn Granlund
tg at gmplib.org
Fri Sep 5 18:07:44 UTC 2014
nisse at lysator.liu.se (Niels Möller) writes:
For a start, let's look at what's needed by gcd_1. It would look
something like (untested):
mp_limb_t
mpn_gcd_1 (mp_srcptr up, mp_size_t size, mp_limb_t vlimb)
{
mp_limb_t ulimb;
ASSERT (size >= 1);
ASSERT (vlimb != 0);
ASSERT_MPN_NONZERO_P (up, size);
ulimb = up[0];
if (size == 1)
return mpn_gcd_11 (ulimb, vlimb);
else
{
unsigned long zero_bits, v_low_zero_bits;
/* Need vlimb odd for modexact. */
count_trailing_zeros (zero_bits, vlimb | ulimb);
count_trailing_zeros (v_low_zero_bits, vlimb);
vlimb >>= v_low_zero_bits;
ulimb = MPN_MOD_OR_MODEXACT_1_ODD (up, size, vlimb);
mpn_gcd_11 (ulimb, vlimb) << zero_bits;
}
}
For the first call to gcd_11, the inputs come direct from the
application. Even numbers must be handled, and one number being a lot
smaller than the other may well be an important use case. Or?
I think we should not ignore that case. I would actually expect quite
different bit counts to be very common.
For the second call to gcd_11, v is always odd, and as far as I can tell
it's pretty unlikely that u and v will differ a lot in size; hence, it's
not worth any extra overhead to check for that case.
Agreed.
Maybe we should have an public mpn_gcd_11 tailored for the first case:
Handling all inputs (and maybe also support gcd(0,0) == 0?), and
checking for very different sizes. And a private (and optional) second
entry point mpn_gcd_11_odd which requires v odd, and is optmized for the
case that u and v are of similar size.
I don't think gcd(0,0) is possible to define. It tends to infinity. We
should in the usual low-level spirit only define gcd(a,0), a > 0 if it
doesn't cost the same as a test in the caller.
Hmm, and MPN_MOD_OR_MODEXACT_1_ODD, is that still useful? For which
machines and ranges is modexact_1 faster than the family of mod_1_*
functions?
It is still very much useful, although the name is odd. The cost of
computing the bdiv inverse is much lower than computing the inverses
needed for mod_1_*. The cutoff points are here:
https://gmplib.org/devel/thres/BMOD_1_TO_MOD_1_THRESHOLD.html
--
Torbjörn
Please encrypt, key id 0xC8601622
More information about the gmp-devel
mailing list