udiv_qr_3by2 vs divappr
paul zimmermann
Paul.Zimmermann at inria.fr
Mon Aug 27 14:40:58 UTC 2018
Dear Niels,
> From: nisse at lysator.liu.se (Niels Möller)
> Date: Sun, 05 Aug 2018 09:59:46 +0200
>
> nisse at lysator.liu.se (Niels Möller) writes:
>
> > static inline mp_limb_t
> > divappr_2 (mp_limb_t n2, mp_limb_t n1,
> > mp_limb_t d1, mp_limb_t d0, mp_limb_t dinv)
> > {
> > mp_limb_t q1, q0, r1, t1, t0, mask;
> >
> > umul_ppmm (q1, q0, n2, dinv);
> > add_ssaaaa (q1, q0, q1, q0, n2, n1);
> >
> > q1++; /* Can't overflow */
> > umul_ppmm (t1, t0, q1, d0); /* t0 isn't used */
> > r1 = n1 - d1 * q1 - t1;
>
> To be able to prove correctness, I'm afraid this needs to be
>
> r1 = n1 - d1 * q1 - t1 - (t0 > 0);
>
> > mask = - (mp_limb_t) (r1 >= q0);
> > q1 += mask;
> > r1 += mask & (d1 + 1);
> > q1 += (r1 >= d1 - 1);
> >
> > return q1;
> > }
>
> I'm attaching a draft paper with the analysis. As for performance, I've
> measurered a few percent speedup on shell.
>
> Regards,
> /Niels
>
> [2:application/pdf Show Save:schoolbook-divappr.pdf (200kB)]
quite interesting! I have a few comments:
page 1: the division instruction is now much faster than before on modern
processors
page 2: sigificant -> significant
page 3, line 2 of Section 3, remove "The output"
page 3, line 5 of Section 3: <d_1, d_0 0> -> <d_1, d_0, 0>
line 6: correesponds -> corresponds
page 3, The algorithm: <d_1, d_0 0> -> <d_1, d_0, 0>
step 3: u_2 -> u_1, u_1 -> u_0
page 4, line 5: d_1 - d_0 -> d_0 - d_1
page 4, line 7: inputes -> inputs
page 4, line 13: the the -> the
page 4, line 20: u_1 -> u_0
page 4, line 21: The last terms -> term
page 4, line 27: for the upper bound, I agree you can subtract the maximal
value of the u_0 term in the proof of Theorem 3 from [4], but this gives
\beta-1, not \beta.
page 4: "it follows that r_1 + d_1 + 1 > \beta". I do not agree, take for
example u_1 = u_0 = 0, d_1 = 128, d_0 = 1 with \beta = 256. You get r_1 = 127,
and r_1 + d_1 + 1 = 256 = \beta. But r_1 + d_1 + 1 >= \beta is enough for the
rest of the proof.
page 4, line -2: on line 8 -> on line 9
page 5, line 6: \beta r_1 + d_0 -> \beta r'_1 + d_0
For the case R' < 0, you did not prove the lower bound for r'_1 <= d_1 - 2,
and did not prove the upper bound for r'_1 >= d_1 - 1.
Case R' >= 0: if the -\beta term is replaced by -(\beta-1), then all I can get
is r_1 <= \beta - d_1 - 1.
Remark: on the other hand, an exhaustive search with \beta=2^k with
1 <= k <= 7 did not find any error, even without the p_0 > 0 extra term.
I wonder whether we can get rid of that term, or always subtract -1.
In the latter case (assuming the proof with the p_0 > 0 extra term is correct),
the only problem would be when p_0 = 0.
Paul
More information about the gmp-devel
mailing list