mpn_perfpow: do we need a special GCD for mp_bitcnt_t?

Fri Sep 5 18:07:44 UTC 2014

nisse at lysator.liu.se (Niels Möller) writes:

  For a start, let's look at what's needed by gcd_1. It would look
  something like (untested):

    mp_limb_t
    mpn_gcd_1 (mp_srcptr up, mp_size_t size, mp_limb_t vlimb)
    {
      mp_limb_t ulimb;

      ASSERT (size >= 1);
      ASSERT (vlimb != 0);
      ASSERT_MPN_NONZERO_P (up, size);

      ulimb = up[0];

      if (size == 1)
        return mpn_gcd_11 (ulimb, vlimb);
      else
        {
          unsigned long  zero_bits, v_low_zero_bits;
          /* Need vlimb odd for modexact. */
          count_trailing_zeros (zero_bits, vlimb | ulimb);
          count_trailing_zeros (v_low_zero_bits, vlimb);

          vlimb >>= v_low_zero_bits;

          ulimb = MPN_MOD_OR_MODEXACT_1_ODD (up, size, vlimb);
          mpn_gcd_11 (ulimb, vlimb) << zero_bits;
        }
    }

  For the first call to gcd_11, the inputs come direct from the
  application. Even numbers must be handled, and one number being a lot
  smaller than the other may well be an important use case. Or?

I think we should not ignore that case.  I would actually expect quite
different bit counts to be very common.

  For the second call to gcd_11, v is always odd, and as far as I can tell
  it's pretty unlikely that u and v will differ a lot in size; hence, it's
  not worth any extra overhead to check for that case.

Agreed.

  Maybe we should have an public mpn_gcd_11 tailored for the first case:
  Handling all inputs (and maybe also support gcd(0,0) == 0?), and
  checking for very different sizes. And a private (and optional) second
  entry point mpn_gcd_11_odd which requires v odd, and is optmized for the
  case that u and v are of similar size.

I don't think gcd(0,0) is possible to define.  It tends to infinity.  We
should in the usual low-level spirit only define gcd(a,0), a > 0 if it
doesn't cost the same as a test in the caller.

  Hmm, and MPN_MOD_OR_MODEXACT_1_ODD, is that still useful? For which
  machines and ranges is modexact_1 faster than the family of mod_1_*
  functions?

It is still very much useful, although the name is odd.  The cost of
computing the bdiv inverse is much lower than computing the inverses
needed for mod_1_*.  The cutoff points are here:
https://gmplib.org/devel/thres/BMOD_1_TO_MOD_1_THRESHOLD.html

-- 
Torbjörn
Please encrypt, key id 0xC8601622