udiv_qr_3by2 vs divappr

Tue Aug 28 08:25:11 UTC 2018

       Dear Niels,

> > page 1: the division instruction is now much faster than before on modern
> >      processors
> 
> According to https://gmplib.org/~tege/x86-timing.pdf, they're still an
> order of magnitute slower than multiplication. E.g 86 vs 3 cycles on
> Intel skylake. And in addition, multiplication is piplined, division
> likely isn't.
> 
> Do you have any suggestion for better wording than "vastly slower" ?

maybe give the number of cycles and refer to x86-timing.pdf?

> > page 3, line 5 of Section 3: <d_1, d_0 0> -> <d_1, d_0, 0>
> 
> I think <d_1, d_0> is right.

yes sorry the extra 0 should have been removed (same in the algorithm)

> > page 4, line 27: for the upper bound, I agree you can subtract the maximal
> > value of the u_0 term in the proof of Theorem 3 from [4], but this gives
> > \beta-1, not \beta.
> 
> This is a bit subtle, maybe I should explicitly repeat the proof form
> [4] with needed modifications. My argument is that the proof in [4] is
> of the form
> 
>   \tilde r = ... + u0 < ... + \beta <= c = max (\beta^2 - D, q_0 \beta)
> 
> But if u_0 < \beta is the only thing that lets us conclude that \tilde r
> < c, with strict inequality, then there's an off-by-one error here.
> 
> Now there are multiple bounds involved; to get \tilde r = c we'd also
> need something like u_1, u_0 and q_0 maximal and d_0 = 0.

if you need to save \beta with respect to the proof of [4], yes maybe you need
to repeat that proof to explain how you save the extra +1.

> > I wonder whether we can get rid of that term, or always subtract -1.
> > In the latter case (assuming the proof with the p_0 > 0 extra term is correct),
> > the only problem would be when p_0 = 0.
> 
> I have looked at that, so far without success. If I remember correctly,
> always subtracting -1 can be treated in the analysis as extending the
> single-word range for r_0 to 0 <= r_0 <= \beta. But that breaks the
> analysis of the  R' < 0 case. We have
> 
>   {r_1, r_0}  >=  q_0 \beta
> 
> We want to conclude r_1 >= q_0, but that no longer holds if we allow r_0
> = \beta. To make it work, we'd need to streighten the lower bound 
> 
>   R' >= q_0 \beta - \beta^2
> 
> Hmm... but from another look at the proof from [4], seems to give us
> strict inequality
> 
>   R' > q_0 \beta - \beta^2
> 
> with no extra effort. So maybe we can do that.

I'll have a look. The exhaustive search up to \beta=2^7 says that always
subtracting 1 seems to work, and the lower bound even improves to -2\beta < R'
instead of -2\beta <= R'.

Paul

PS: an extra typo in the new version: page 5, lines 4 and 5, d_o should be d_0