udiv_qr_3by2 vs divappr
Niels Möller
nisse at lysator.liu.se
Sun Apr 29 08:52:43 UTC 2018
nisse at lysator.liu.se (Niels Möller) writes:
> nisse at lysator.liu.se (Niels Möller) writes:
>
>> If we can get away with that, it gets a lot
>> simpler,
>>
>> if (r >= q_0) /* Needs to be branch free */
>> {
>> q--;
>> r += d_1;
>> }
>> return q + (r >= d_1);
>
> I've given it a try. Below patch almost works (as in "passes most but
> not all tests")... I don't yet know if the failure is due to lack of
> accuracy, or some less subtle type of bug.
The first failure was for an input with n2 == d1, n1 == d1-1.
static void test_divappr_2(void)
{
mp_limb_t q = divappr_2(0x8000000000000001ULL, 0xffffffffbfffffffULL,
0x8000000000000001ULL, 0xffffffffc0000000ULL,
0xfffffffffffffff8ULL);
ASSERT_ALWAYS (q != 0);
}
Then quotient should be B-1, but divappr tried to return the one of
quotient B, which then fails due to overflow. The below variant passes
tests:
static inline mp_limb_t
divappr_2 (mp_limb_t n2, mp_limb_t n1,
mp_limb_t d1, mp_limb_t d0, mp_limb_t dinv)
{
mp_limb_t q1, q0, r1, t1, t0, mask;
umul_ppmm (q1, q0, n2, dinv);
add_ssaaaa (q1, q0, q1, q0, n2, n1);
umul_ppmm (t1, t0, q1, d0);
q1++; /* May overflow */
r1 = n1 - d1 * q1 - t1;
/* FIXME: See if we can get sufficient accuracy without this? */
t0 += d0;
r1 -= (t0 < d0);
mask = - (r1 >= q0);
q1 += mask;
r1 += mask & d1; /* Add t0-related carry? */
#if 0
q1 += (r1 >= d1);
#else
/* FIXME: Take care of this case more efficiently,
need analysis of the case n1 == d1. */
if (r1 >= d1 && q1 != GMP_NUMB_MAX)
q1++;
#endif
return q1;
}
This needs more analysis. Since r in this code is also approximate, we
get subtleties when incrementing or decrementing r. Incrementing r may
be needed to ensure that we don't produce a quotient which is too small
in the context of schoolbook division. But in the failure case, that
pushes r1 over the limit and triggers the final r1 >= d1 check, even
though q1 was already maximal B - 1.
And we can't make arbitrary tweaks to r1, since we must at all times
ensure that the r1 >= q0 test is accurate, and that's pretty tight
already.
Regards,
/Niels
--
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel
mailing list