udiv_qr_3by2 vs divappr

Sat Apr 28 21:19:54 UTC 2018

nisse at lysator.liu.se (Niels Möller) writes:

> * It might be possible to implement this cheaper than udiv_qr_3by2,
>   omitting most of the book-keeping and carry propagation involving the
>   low remainder word r0.

I've made some progress, not yet clear if it will work out. I think I
can prove the following, analogous to the theorems in the paper:

Let 

  d = {d_1, d_0} be the normalized divisor, B^2 / 2 <= d < B^2.

Let v be the same inverse as for 3/2, 

  v = floor ((B^3-1) / d) - B

Let {u_2, u_1} < d be the top limbs of the numerator.

Compute the candidate quotient + fraction, in the same way as for 3/2:

  {q_1, q_0} = (v+B) u_2 + u_1 = v u_2 + {u_2, u_1}

Let r be the candidate remainder, almost single word

  r = {u_2, u_1} - floor ((q_1 + 1) d / B)

Then r lies in the range

  c - B = r < c, with c = max(B - d_1, q_0)

Hence, r is uniquely determined by r mod B, q_0 and d_1. Still unclear
how well it can work out. To guarantee that we produce a q which never
is too small for schoolbook division, we might need to add some
adjustment term, like

  r = {u_2, u_1} - floor (((q_1 + 1) d + B - 1)/ B)

and then we're almost back to computing the next remainder limb. Another
alternative might be arrange so that the final q corresponds to a
remainder in the range

  -1 <= r < d_1

In this case, maybe it's best to return only q, not r (unless we can
find a nice way to handle the -1 case in the schoolbook division loop).

Starting with q = q_1 + 1, p_0 = q*d_0 (mod B), and r computed
only mod B, the adjustment steps would be something like

if (r >= q_0)   /* Needs to be branch free */
  {
    q--;
    r += d_1 + (d_0 > p_0);
    p_0 -= d_0;
  } 

if (r >= d_1)   /* Unlikely */
  {
    q++;
    p_0 += d_0;
    r -= d_1 + (p_0 < d_0);  /* Might underflow */
  }

assert (p_0 == q * d_0);
return q;  /* And also return r and p_0, when we've went to the trouble
              to update them. */

The comparisons involving d_0 and p_0 in the r update are needed to
update the contribution from mulhi(q*d_0) when we increment and
decrement q. Maybe we don't need them if we only return q, and leave to
the caller to recompute r (e.g, doing a submul_1 on the complete
unnormalized operands). If we can get away with that, it gets a lot
simpler,

if (r >= q_0)   /* Needs to be branch free */
  {
    q--;
    r += d_1;
  } 
return q + (r >= d_1);

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.