Niels Möller nisse at lysator.liu.se
Mon Dec 7 21:52:42 CET 2009

```Torbjorn Granlund <tg at gmplib.org> writes:

>   From the work on single-limb inverses (invert_limb, see Alg. 2 and 3 in
>   http://www.lysator.liu.se/~nisse/archive/draft-division-paper.pdf), I
>   suspect that it is very difficult to get an "exact" reciprocal
>   floor((B^{2n} - 1) / X) without an expensive multiplication at the end.
>
> Define "expensive multiplication".  For the single-limb case, an extra
> limb multiplication is "expensive".  For the n-limb case, the result
> should be possible to adjust in time O(n).

By expensive, I mean O(M(n)), and that this adjustment is needed in the
worst case (there may be tricks like for mulhi to avoid it in the common
case). I think the fundamental problem is that we want to distinguish
between

floor(B^{2n} / A) = X, B^{2n} - A X = R close to 0

and

floor(B^{2n} / A) = X-1, B^{2n} - A (X - 1) = R close to A-1

Almost any kind of errors in the inputs to the final iteration may in
the worst case result in a value on the wrong side of this border. It
doesn't matter have tight error bounds you have, even a very small error
can straddle this boundary for some inputs.

I think that using a extra limbs to increase precision (either inside
the inversion algorithm, or in its caller) will reduce the probability
of error, at an O(n) cost, but not eliminate it.

If we get the "exact" reciprocal, X = floor((B^{2n} - 1) / A), then we have

0 < B^{2n} - A X <= A

This is what we do for invert_limb. As I said, I'm afraid that achieving
this requires an expensive adjustment at the end. But I think we can
achieve the relaxed bound

0 < B^{2n} - AX <= A (1 + epsilon)

for some fairly small epsilon, without needing any expensive adjustment.
The MCA algorithm and proof gives us epsilon = 1, but I expect that one
can get this down closer to 1/B by using an extra limb of precision at
the right places.

> The Barrett-style GMP division code already works with approximate
> inverses, as long as the approximations are not greater than
> (B^{2n}-1)/D.

Excellent. Then one should definitely omit any expensive final