udiv_qr_3by2 vs divappr
paul zimmermann
Paul.Zimmermann at inria.fr
Tue Aug 28 08:25:11 UTC 2018
Dear Niels,
> > page 1: the division instruction is now much faster than before on modern
> > processors
>
> According to https://gmplib.org/~tege/x86-timing.pdf, they're still an
> order of magnitute slower than multiplication. E.g 86 vs 3 cycles on
> Intel skylake. And in addition, multiplication is piplined, division
> likely isn't.
>
> Do you have any suggestion for better wording than "vastly slower" ?
maybe give the number of cycles and refer to x86-timing.pdf?
> > page 3, line 5 of Section 3: <d_1, d_0 0> -> <d_1, d_0, 0>
>
> I think <d_1, d_0> is right.
yes sorry the extra 0 should have been removed (same in the algorithm)
> > page 4, line 27: for the upper bound, I agree you can subtract the maximal
> > value of the u_0 term in the proof of Theorem 3 from [4], but this gives
> > \beta-1, not \beta.
>
> This is a bit subtle, maybe I should explicitly repeat the proof form
> [4] with needed modifications. My argument is that the proof in [4] is
> of the form
>
> \tilde r = ... + u0 < ... + \beta <= c = max (\beta^2 - D, q_0 \beta)
>
> But if u_0 < \beta is the only thing that lets us conclude that \tilde r
> < c, with strict inequality, then there's an off-by-one error here.
>
> Now there are multiple bounds involved; to get \tilde r = c we'd also
> need something like u_1, u_0 and q_0 maximal and d_0 = 0.
if you need to save \beta with respect to the proof of [4], yes maybe you need
to repeat that proof to explain how you save the extra +1.
> > I wonder whether we can get rid of that term, or always subtract -1.
> > In the latter case (assuming the proof with the p_0 > 0 extra term is correct),
> > the only problem would be when p_0 = 0.
>
> I have looked at that, so far without success. If I remember correctly,
> always subtracting -1 can be treated in the analysis as extending the
> single-word range for r_0 to 0 <= r_0 <= \beta. But that breaks the
> analysis of the R' < 0 case. We have
>
> {r_1, r_0} >= q_0 \beta
>
> We want to conclude r_1 >= q_0, but that no longer holds if we allow r_0
> = \beta. To make it work, we'd need to streighten the lower bound
>
> R' >= q_0 \beta - \beta^2
>
> Hmm... but from another look at the proof from [4], seems to give us
> strict inequality
>
> R' > q_0 \beta - \beta^2
>
> with no extra effort. So maybe we can do that.
I'll have a look. The exhaustive search up to \beta=2^7 says that always
subtracting 1 seems to work, and the lower bound even improves to -2\beta < R'
instead of -2\beta <= R'.
Paul
PS: an extra typo in the new version: page 5, lines 4 and 5, d_o should be d_0
More information about the gmp-devel
mailing list