Division by 9 (and other factors of 2^60 - 1)

Mon Oct 12 16:14:18 CEST 2009

Here's' some code for dividing using multiplication + Hensel-division
by 2^60 - 1. Useful for factors of

  2^60 - 1 = 3^2 * 5^2 * 7 * 11 * 13 * 31 * 41 * 61 * 151 * 331 * 1321

Book-keeping would probably be easier if we computed -q instead of q.
Maybe there's some smart way to handle the negation differently. With
the current code, there are two recurrency variables, one limb and one
borrow flag (and for typical use, when bd < B/2, they should in
principle fit in a single signed word, but I don't see how to get any
advantage out of that).

Seems to work fine, but I haven't timed it. The recurrence on h looks
quite deep, unfortunately. At least 7 cycles or so as written, so I
imagine it'll be a challenge beating repeated div_by3 on machines such
as opteron, where the latter is limited by latency and not
multiplication throughput.

Maybe one can simplify the expression

   (q >> 4) - ((q << 60) < q)

to replace some of the occurences of q by t0; e.g., q << 60 ought to
be the same as (- t0) << 60 = - (t0 << 60), which is availble earlier.

Unlike bdiv_dbmc, this code doesn't handle carry in and out of the
function in a sane way; the return value is somewhat bogus.

The same algorithm could be used for 2^k - 1 for other k in the range
GMP_LIMB_BITS/2 <= k <= GMP_LIMB_BITS -1. I'm not sure if
parametrizing the shift count is going to slow it down (on x86, one
would need to move the values k and 64-k in and out of the %cl
register).

Regards,
/Niels

#include <stdio.h>
#include <stdlib.h>

#include "gmp.h"
#include "gmp-impl.h"
#include "longlong.h"

mp_limb_t
bdiv_t60m1 (mp_ptr qp, mp_srcptr ap, mp_size_t n, mp_limb_t bd)
{
  mp_size_t i;
  mp_limb_t h;
  mp_limb_t cy;

  for (i = 0, h = cy = 0; i < n; i++)
    {
      mp_limb_t a = ap[i];
      mp_limb_t q;
      mp_limb_t t1, t0;

      umul_ppmm (t1, t0, a, bd);

      t0 += h;
      t1 += (t0 < h);

      /* Hensel division by 2^60 - 1, inverse -(2^60 + 1) */
      q = -((t0 << 60) + t0);
      qp[i] = q;

      SUBC_LIMB (cy, h, t1, (q >> 4) - ((q << 60) < q) + cy);
    }

  return h;
}

#define diveby9(qp, ap, n) \
  bdiv_t60m1 ((qp), (ap), (n), ((CNST_LIMB(1) << 60) - 1) / 9)

#define TESTSIZE 100
int
main(int argc, char **argv)
{
  mp_limb_t a[TESTSIZE];
  mp_limb_t b[TESTSIZE+1];
  mp_limb_t q[TESTSIZE+1];
  unsigned i;

  for (i = 0; i < 100; i++)
    {
      mp_limb_t r;
      mpn_random (a, TESTSIZE);
      b[TESTSIZE] = mpn_mul_1 (b, a, TESTSIZE, 9);
      r = mpn_divrem_1 (q, 0, b, TESTSIZE+1, 9);

      if (r || q[TESTSIZE] || mpn_cmp(q, a, TESTSIZE))
	{
	  gmp_printf ("divrem_1 failed, r = %d\n"
		      "  %Nx (expected)\n"
		      "  %Nx (got)\n",
		      r, a, TESTSIZE, q,  TESTSIZE + 1);
	  return EXIT_FAILURE;
	}
      r = diveby9 (q, b, TESTSIZE+1);

      if (q[TESTSIZE] || mpn_cmp(q, a, TESTSIZE))
	{
	  gmp_printf ("diveby9 failed, r = %x\n"
		      "  %Nx (expected)\n"
		      "  %Nx (got)\n",
		      r, a, TESTSIZE, q,  TESTSIZE + 1);
	  return EXIT_FAILURE;
	}
    }
  return EXIT_SUCCESS;
}