Torbjorn Granlund tg at gmplib.org
Mon Dec 7 20:13:26 CET 2009

```  > The "adjustment step" I'm using now... is needed only for the last
> iteration (the intermediate ones are adjusted by the next iteration), but
> it is _not_ cheap.

From the work on single-limb inverses (invert_limb, see Alg. 2 and 3 in
http://www.lysator.liu.se/~nisse/archive/draft-division-paper.pdf), I
suspect that it is very difficult to get an "exact" reciprocal
floor((B^{2n} - 1) / X) without an expensive multiplication at the end.

Define "expensive multiplication".  For the single-limb case, an extra
limb multiplication is "expensive".  For the n-limb case, the result
should be possible to adjust in time O(n).

I think the value returned by the MCA algorithm equals either this
"exact" reciprocal or is one unit too small. Not sure if it being too
small is common, or if it can happen only in borderline cases where
B^{2n} mod X is close to zero (but I'm fairly sure that with some tweaks
it can be reduced to only such borderline cases). It would be desirable
to make Barrett-style division work with such an approximative
reciprocal, to avoid the expensive final adjustment step.

The Barrett-style GMP division code already works with approximate
inverses, as long as the approximations are not greater than
(B^{2n}-1)/D.

But as was earlier pointed out in this thread, the Barrett-style GMP
code gets itself a more accurate inverse by passing D * B to
mpn_inverse, and then use floor{I/B}.  (D is the [high part of the]
divisor, I is the inverse.)  The division code does this

I think the old mpn_inverse used to compute a too small inverse quite
often.  If we can get mpn_inverse to almost always compute the correct
inverse, there will be no point for the division code to bother with
getting a more accurate value.

Getting mpn_invert to compute a slightly better value should not be
hard.  For example, including another limb from D (called A in MCA)
should help.  Also, one may avoid doubling the number of limbs in each
iteration.  The basecase will surely be rather large, judging from
mpn_binvert it will be over 500 limbs, which means an extra limb will
imply a very little relative cost.

--
Torbjörn
```