GCD project status?

Torbjörn Granlund tg at gmplib.org
Wed Sep 18 21:38:53 UTC 2019


  > * Speed up div1, presumably with a table-based approach.

A quick-and-dirty but working variant below.

The table size with NBITS = 8 is 32KiB.  Far too big.  It avoids
branches in 83% of the cases, which is not that great considering the
huge cost of mispredicted branches.

With some thought, I am sure the table size could come down while the
accuracy could improve.

#define NBITS 8
static unsigned char tab[1 << (2 * NBITS - 1)];
static unsigned char *tabp = tab - (1 << (NBITS - 1) << NBITS);

mp_double_limb_t
div1 (mp_limb_t n0, mp_limb_t d0)
{
  int ncnt;
  size_t ni, di;
  mp_limb_t q0;
  mp_limb_t r0;
  mp_limb_t mask;
  mp_double_limb_t res;

  count_leading_zeros (ncnt, n0);
  ni = n0 >> (64 - NBITS - ncnt);
  di = d0 >> (64 - NBITS - ncnt);

  if (UNLIKELY (di < (1 << NBITS/2)))
    {
      q0 = n0 / d0;
      r0 = n0 - q0 * d0;
    }
  else
    {
      q0 = tabp[(ni << NBITS) + di];
      r0 = n0 - q0 * d0;
      mask = -(mp_limb_t) (r0 >= d0);
      q0 -= mask;
      r0 -= d0 & mask;
    }

  res.d1 = q0;
  res.d0 = r0;
  return res;
}

void
init_tab()
{
  size_t ni, di;

  for (ni = 1 << (NBITS - 1); ni < (1 << NBITS); ni++)
    {
      for (di = 0; di < (1 << NBITS); di++)
	{
	  tabp[(ni << NBITS) + di] = (ni << NBITS) / ((di << NBITS) + ((1 << NBITS)-1));
	}
    }
}


-- 
Torbjörn
Please encrypt, key id 0xC8601622


More information about the gmp-devel mailing list