udiv_qr_3by2 vs divappr
Niels Möller
nisse at lysator.liu.se
Sat Apr 28 21:19:54 UTC 2018
nisse at lysator.liu.se (Niels Möller) writes:
> * It might be possible to implement this cheaper than udiv_qr_3by2,
> omitting most of the book-keeping and carry propagation involving the
> low remainder word r0.
I've made some progress, not yet clear if it will work out. I think I
can prove the following, analogous to the theorems in the paper:
Let
d = {d_1, d_0} be the normalized divisor, B^2 / 2 <= d < B^2.
Let v be the same inverse as for 3/2,
v = floor ((B^3-1) / d) - B
Let {u_2, u_1} < d be the top limbs of the numerator.
Compute the candidate quotient + fraction, in the same way as for 3/2:
{q_1, q_0} = (v+B) u_2 + u_1 = v u_2 + {u_2, u_1}
Let r be the candidate remainder, almost single word
r = {u_2, u_1} - floor ((q_1 + 1) d / B)
Then r lies in the range
c - B = r < c, with c = max(B - d_1, q_0)
Hence, r is uniquely determined by r mod B, q_0 and d_1. Still unclear
how well it can work out. To guarantee that we produce a q which never
is too small for schoolbook division, we might need to add some
adjustment term, like
r = {u_2, u_1} - floor (((q_1 + 1) d + B - 1)/ B)
and then we're almost back to computing the next remainder limb. Another
alternative might be arrange so that the final q corresponds to a
remainder in the range
-1 <= r < d_1
In this case, maybe it's best to return only q, not r (unless we can
find a nice way to handle the -1 case in the schoolbook division loop).
Starting with q = q_1 + 1, p_0 = q*d_0 (mod B), and r computed
only mod B, the adjustment steps would be something like
if (r >= q_0) /* Needs to be branch free */
{
q--;
r += d_1 + (d_0 > p_0);
p_0 -= d_0;
}
if (r >= d_1) /* Unlikely */
{
q++;
p_0 += d_0;
r -= d_1 + (p_0 < d_0); /* Might underflow */
}
assert (p_0 == q * d_0);
return q; /* And also return r and p_0, when we've went to the trouble
to update them. */
The comparisons involving d_0 and p_0 in the r update are needed to
update the contribution from mulhi(q*d_0) when we increment and
decrement q. Maybe we don't need them if we only return q, and leave to
the caller to recompute r (e.g, doing a submul_1 on the complete
unnormalized operands). If we can get away with that, it gets a lot
simpler,
if (r >= q_0) /* Needs to be branch free */
{
q--;
r += d_1;
}
return q + (r >= d_1);
Regards,
/Niels
--
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel
mailing list