udiv_qr_3by2 vs divappr

Fri Sep 21 13:04:13 UTC 2018

       Torbjörn,

> I suppose a "naive" implementation scales the full dividend and the full
> divisor by a factor which fits in a limb.

scaling the dividend is not mandatory. You can first reduce the upper n-1
limbs of the original dividend using the scaled divisor, then reduce the
last limb using the original divisor. This is especially interesting if only
the remainder is needed.

In fact, the multiplier has one limb + 1 bit, if the divisor is normalized,
so k*B = \beta^{n+1} + B' with B' < \beta^n (see Algorithm 1.7 in [1]).

> One could in theory save some
> time by scaling just their O(log(Q)) most significant limb.

I don't see how you achieve this, since for the submul part you need the full
scaled divisor.

> How likely does Svoboda overesimate a quotient limb?

good question. Unfortunately the authors of [1] do not answer that question.
If at some point of the loop in Algorithm 1.7 we have
A = a_{n+j} \beta^{n+j} + R and B' = \beta^{n+1} + S, with
0 \leq R < \beta^{n+j} and 0 \leq S < \beta^n, then the next remainder
will be T = A - a_{n+j} \beta^{j-1} B' = R - a_{n+j} \beta^{j-1} S.
We thus have a_{n+j} \beta^{n+j-1} < T < \beta^{n+j}. Writing
R = X \beta^{n+j}, S = Y \beta^n, a_{n+j} = Z \beta, with 0 <= X, Y, Z < 1,
we get: T = (X  - Y Z) \beta^{n+j}. If X, Y, Z are uniform in [0,1] and
independent, then the probability that X  - Y Z >= 0 is 3/4, thus Svoboda's
algorithm will overestimate the quotient with probability 1/4 (and in that
case, only one correction is needed).

Of course you can use a 2-limb variant of Svoboda: multiply the divisor by
2 limbs + 1 bit, so that the scaled divisor is now k*B = \beta^{n+2} + B'
with B' < \beta^n. Then the quotient will be overestimated with probability
less than 1/\beta.

Paul

[1] https://members.loria.fr/PZimmermann/mca/mca-cup-0.5.9.pdf