udiv_qr_3by2 vs divappr

Sun Apr 29 08:52:43 UTC 2018

nisse at lysator.liu.se (Niels Möller) writes:

> nisse at lysator.liu.se (Niels Möller) writes:
>
>>  If we can get away with that, it gets a lot
>> simpler,
>>
>> if (r >= q_0)   /* Needs to be branch free */
>>   {
>>     q--;
>>     r += d_1;
>>   } 
>> return q + (r >= d_1);
>
> I've given it a try. Below patch almost works (as in "passes most but
> not all tests")... I don't yet know if the failure is due to lack of
> accuracy, or some less subtle type of bug.

The first failure was for an input with n2 == d1, n1 == d1-1. 

static void test_divappr_2(void)
{
  mp_limb_t q = divappr_2(0x8000000000000001ULL, 0xffffffffbfffffffULL,
			  0x8000000000000001ULL, 0xffffffffc0000000ULL,
			  0xfffffffffffffff8ULL);
  ASSERT_ALWAYS (q != 0);
}

Then quotient should be B-1, but divappr tried to return the one of
quotient B, which then fails due to overflow. The below variant passes
tests:

static inline mp_limb_t
divappr_2 (mp_limb_t n2, mp_limb_t n1,
	   mp_limb_t d1, mp_limb_t d0, mp_limb_t dinv)
{
  mp_limb_t q1, q0, r1, t1, t0, mask;

  umul_ppmm (q1, q0, n2, dinv);
  add_ssaaaa (q1, q0, q1, q0, n2, n1);

  umul_ppmm (t1, t0, q1, d0);
  q1++; /* May overflow */
  r1 = n1 - d1 * q1 - t1;

  /* FIXME: See if we can get sufficient accuracy without this? */
  t0 += d0;
  r1 -= (t0 < d0);

  mask = - (r1 >= q0);
  q1 += mask;
  r1 += mask & d1; /* Add t0-related carry? */
#if 0
  q1 += (r1 >= d1);
#else
  /* FIXME: Take care of this case more efficiently,
     need analysis of the case n1 == d1. */
  if (r1 >= d1 && q1 != GMP_NUMB_MAX)
    q1++;
#endif
  return q1;
}

This needs more analysis. Since r in this code is also approximate, we
get subtleties when incrementing or decrementing r. Incrementing r may
be needed to ensure that we don't produce a quotient which is too small
in the context of schoolbook division. But in the failure case, that
pushes r1 over the limit and triggers the final r1 >= d1 check, even
though q1 was already maximal B - 1.

And we can't make arbitrary tweaks to r1, since we must at all times
ensure that the r1 >= q0 test is accurate, and that's pretty tight
already.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.