# New mpn_divexact code

Paul Zimmermann Paul.Zimmermann at loria.fr
Thu Aug 3 12:04:45 CEST 2006

>       mpn_mul (tp, dp, dn, qp, in);
>
> Here, {qp,in} is much smaller than {dp,dn} and your FFT code cleverly
> transforms {qp,in} only once.

> Further speedups should be possible if {ip,in} where pre-transformed,
> and then used in the mpn_mullow_n calls.  Perhaps pre-transformation
> would be possible also for the comparatively large {dp,dn}?

Yes of course.

>   When the quotient size (qn) is smaller than the divisor size (dn),
>   it suffices to consider dn limbs of the dividend and divisor (either
>   low or high bits, or dn/2 low bits and dn/2 high bits), thus a cutoff
>   point is enough.
>
> I think you mean qn instead of dn here the last three times...

Right.

>   The tricky thing here is to glue the high and low parts of the quotient,
>   and ensure the middle bits are correct.
>
> I can imagine.  Are you aware of any free implementation, or perhaps
> pseudo code, that one could start from?

Here is something I wrote for the book I am writing with Richard Brent.
Basically it says that an overlap of 2 bits is enough. Comments are welcome.

\begin{exercise} \label{exo_exact_div}
Design an algorithm that performs an exact division of a $4n$-bit integer by a
$2n$-bit integer, with a quotient of $2n$ bits, using the idea from the last
paragraph of \textsection{exactdiv}. Prove that your algorithm is correct.
\end{exercise}
Let $A$ be the divisor, $B$ the dividend, and $Q$ the quotient.
Let $\beta=2^n$, and $Q = Q_1 \beta + Q_0$.
$Q_0$ is given by $Q_0 = A_0/B_0 (\bmod \beta)$, where $A_0 = A (\bmod \beta)$,
and $B_0 = B (\bmod \beta)$.
Now let $Q'_1$ be the rounding to $-\infty$ of $A/B$ to $n$ bits
(using the algorithms of Chapter~\ref{float}).
Since $Q$ is assumed to have $2n$ bits, we have $2^{2n-1} \leq A/B < 2^{2n}$,
thus $2^{2n-1} \leq Q'_1 < 2^{2n}$.
Since $Q'_1$ has (at most)
$n$ significant bits, necessarily $Q'_1$ can be written $R 2^m$.
It is easy to prove that $R=Q_1$.

The above solution is cheating, since determining the correct rounding
of $A/B$ to $n$ bits may need to consider all the bits of $A$ and $B$.
Now assume we divide the upper $2n$ bits of $A$ by the $n$ upper bits of
$B$: $A_3 \beta + A_2 = Q'_1 B_1 + R$.
By an analysis similar to that of Theorem~\ref{theoBasecaseDivRem},
we can prove that $Q'_1$ is never too small, and can be at most too large
by $2$. Thus $Q'_1 - 2 \leq Q_1 \leq Q'_1$. Thus computing two more
bits of $Q_0$ --- i.e.~$n+2$ bits instead of $n$ --- will be enough to
determine the value of $Q_1$, since possible candidates will differ by $4$,
thus only one candidate lies in $[Q'_1 - 2, Q'_1]$.
}

> How much time does Jebelean's method need in terms of M(n),
> compared to the traditional Newton inversion?

If D(qn) is the time needed to divide exactly the least qn words,
Jebelean's method needs 2*D(qn/2). This should save a factor of 2
for the basecase, and asymptotically nothing (but the memory usage
will decrease).

> My Newton inversion code could surely be improved without involving
> Jebelean's method.  There are comments in the file (see
> http://swox.com/gmp/devel/mpn/generic/invert_2exp.c) from you ("Paul