Computing A mod d (for small odd d without division and multiplication)

Mon Mar 9 17:58:30 CET 2026

Ciao,
8 mar 2026, 19:59 da tg at gmplib.org:

> This algorithm is useful mainly for quite small d, or more precisely
> when the number of limbs in A >> d.  This is because the rems[] vector
> will else potentially be large and expensive to compute.
>
Do we need all the value in rems[]? I believe we don't.

Instead of accumulating in acc, we can multiply lo by the inverse of B mod d,
and add hi to it, then we can keep on accumulating.
At the end of the cycle, we will have in [hi,lo] a number equivalent to A mod d.

The good thing with this approach is that
for a given d we only need to pre-compute and keep the order and the inverse.

I try to write a code similar to yours:

mp_limb_t
mpn_mod_1s_pi (const mp_limb_t* ap, mp_size_t n, mp_limb_t d, mp_size_t ord, mp_limb_t pi)
{
  mp_limb_t hi, lo = 0;
  // Sum every ord limb into sums, multiplying each sum by the inverse pi.
  for (int i = 0; i < ord; i++) {
    const mp_limb_t* tp = ap + i;

    while (tp < ap + n) {
      add_ssaaaa (hi, lo, hi, lo, 0, *tp);
      tp += ord;
    }

    // add_ssaaaa (hi, lo, 0, hi, 0, lo % d * pi); // alternative code
    mp_limb_t t;
    umul_ppmm(t, lo, pi, lo);
    add_ssaaaa (hi, lo, t, lo, 0, hi);
  }

  lo %= d;

  if (UNLIKELY (hi != 0))
    {
      mp_limb_t r;
      r = ~(mp_limb_t)0 % d + 1; // r = B mod d (B is the limb base)
      lo += r * hi;
      lo %= d;
    }
  // Now lo contains a value which is congruent to A mod d.
  return lo;
}

mp_limb_t
mpn_mod_1s (const mp_limb_t* ap, mp_size_t n, mp_limb_t d)
{
  mp_limb_t pi, r, t;

  r = ~(mp_limb_t)0 % d + 1; // r = B mod d (B is the limb base)

  // Compute B^t mod d for 0 <= t < k where k > 1 is the smallest number for
  // which B^k != 1.
  t = r;
  int nrems = 1;
  do {
    pi = t;
    ++nrems;
    t = t * r % d;
  } while (t != 1);

  return mpn_mod_1s_pi (ap, n, d, nrems, pi);
}

Ĝis,
mb