divappr interface

Niels Möller nisse at lysator.liu.se
Fri Apr 27 19:25:58 UTC 2018


Hi, 

after the discussion on div_q, I had another look at divappr, or more
precisely, the mpn_sbpi1_divappr_q function.

It's not entirely obvious from the implementation which limbs of the
{np, nn} argument are actually used. But I think it should access only
the qn+2 upper limbs. For accuracy, it should be sufficient with qn+1
limbs, but I suspect qn+2 is more convenient since we lack a variant of
mpn_submul_1 which ignores the low half of the lowest limb product (we'd
need a function for R -= floor (D * q / B)).

For a call to divappr to produce qn quotient limbs (plus possibly a qh
being zero or one), inputs should be a numerator of qn+2 limbs and a
normalized divisor of at most qn+1 limbs.

As long as qn > dn - 1, we loop computing one exact quotient limb and
eliminating one high limb of the numerator per iteration, decrementing qn.
(This phase would be equivalent to a call to sbpi1_div_qr, I guess).

Once we reach qn = dn - 1, keep looping to produce quotient limbs, but
also discard one limb of dp in each interation, until we in the final
iteration have qn = 1, qn+2 = 3 numerator limbs, and qn+1 = 2 divisor
limbs, i.e., a single udiv_qr_3by2 (where we might consider skipping the
adjustment steps). I haven't done the error analysis, but I would expect
that errors are similar to the current code.

This is analogous to the discussion of mpn_bdiv_q, which we discussed
years ago, where N and Q are of the same size, and D possibly smaller.
The structure with two loops is also similar to the old mpn_divrem
function with fractional limbs; it would be cleaner if we could have a
function doing only the fractional part, i.e., require that dn == qn +
1.

Fixing this should provide a small speedup for mpn_div_q, since 

  new_nn = 2 * qn + 1;
  ...
  cy = mpn_lshift (new_np, np + nn - new_nn, new_nn, cnt);
  ...
  qh = mpn_sbpi1_divappr_q (tp, new_np, new_nn, new_dp, qn + 1, dinv.inv32);

implies that new_nn and the shift are twice as large as they need to be
(new_nn should be only qn+2).

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.



More information about the gmp-devel mailing list