# udiv_qr_3by2 vs divappr

Niels Möller nisse at lysator.liu.se
Mon Aug 27 19:46:30 UTC 2018

paul zimmermann <Paul.Zimmermann at inria.fr> writes:

> page 1: the division instruction is now much faster than before on modern
>      processors

According to https://gmplib.org/~tege/x86-timing.pdf, they're still an
order of magnitute slower than multiplication. E.g 86 vs 3 cycles on
Intel skylake. And in addition, multiplication is piplined, division
likely isn't.

Do you have any suggestion for better wording than "vastly slower" ?

> page 2: sigificant -> significant

Fixed.

> page 3, line 2 of Section 3, remove "The output"

Fixed.

> page 3, line 5 of Section 3: <d_1, d_0 0> -> <d_1, d_0, 0>

I think <d_1, d_0> is right.

>         line 6: correesponds -> corresponds

Fixed.

> page 3, The algorithm: <d_1, d_0 0> -> <d_1, d_0, 0>

Should be <d_1, d_0>, as above.

>         step 3: u_2 -> u_1, u_1 -> u_0

Fixed.

> page 4, line 5: d_1 - d_0 -> d_0 - d_1

Fixed.

> page 4, line 7: inputes -> inputs

Fixed.

> page 4, line 13: the the -> the

Fixed.

> page 4, line 20: u_1 -> u_0

Fixed.

> page 4, line 21: The last terms -> term

Fixed.

> page 4, line 27: for the upper bound, I agree you can subtract the maximal
> value of the u_0 term in the proof of Theorem 3 from , but this gives
> \beta-1, not \beta.

This is a bit subtle, maybe I should explicitly repeat the proof form
 with needed modifications. My argument is that the proof in  is
of the form

\tilde r = ... + u0 < ... + \beta <= c = max (\beta^2 - D, q_0 \beta)

But if u_0 < \beta is the only thing that lets us conclude that \tilde r
< c, with strict inequality, then there's an off-by-one error here.

Now there are multiple bounds involved; to get \tilde r = c we'd also
need something like u_1, u_0 and q_0 maximal and d_0 = 0.

> page 4: "it follows that r_1 + d_1 + 1 > \beta". I do not agree, take for
> example u_1 = u_0 = 0, d_1 = 128, d_0 = 1 with \beta = 256. You get r_1 = 127,
> and r_1 + d_1 + 1 = 256 = \beta. But r_1 + d_1 + 1 >= \beta is enough for the
> rest of the proof.

Good catch, your example seems to be correct.

> page 4, line -2: on line 8 -> on line 9

Fixed.

> page 5, line 6: \beta r_1 + d_0 -> \beta r'_1 + d_0

Fixed.

> For the case R' < 0, you did not prove the lower bound for r'_1 <= d_1 - 2,
> and did not prove the upper bound for r'_1 >= d_1 - 1.

I probably thought they were "trivial", but I have to double check, and
say something about it.

> Case R' >= 0: if the -\beta term is replaced by -(\beta-1), then all I can get
> is r_1 <= \beta - d_1 - 1.

That is problematic, for this case, we really need that r_1 + (d_1 + 1)
doesn't overflow. I think we need to strengthen the proof to get a
-\beta term, and possibly single out any problematic special cases.

> I wonder whether we can get rid of that term, or always subtract -1.
> In the latter case (assuming the proof with the p_0 > 0 extra term is correct),
> the only problem would be when p_0 = 0.

I have looked at that, so far without success. If I remember correctly,
always subtracting -1 can be treated in the analysis as extending the
single-word range for r_0 to 0 <= r_0 <= \beta. But that breaks the
analysis of the  R' < 0 case. We have

{r_1, r_0}  >=  q_0 \beta

We want to conclude r_1 >= q_0, but that no longer holds if we allow r_0
= \beta. To make it work, we'd need to streighten the lower bound

R' >= q_0 \beta - \beta^2

Hmm... but from another look at the proof from , seems to give us
strict inequality

R' > q_0 \beta - \beta^2

with no extra effort. So maybe we can do that.

So we really should redo the proof from  in the current setting, and
prove the bounds we need. Which seems to be

R' < max (\beta^2 - D, q_0 \beta) - \beta
R' >= - D
R' > q_0 \beta - \beta^2

While the current arguments don't rule out equality.

Thanks for the careful review! Updated version (with all the things that
were easy to fix) at
https://www.lysator.liu.se/~nisse/misc/schoolbook-divappr.pdf

Regards,
/Niels

--
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.