udiv_qr_3by2 vs divappr
Niels Möller
nisse at lysator.liu.se
Sun Aug 5 07:59:46 UTC 2018
nisse at lysator.liu.se (Niels Möller) writes:
> static inline mp_limb_t
> divappr_2 (mp_limb_t n2, mp_limb_t n1,
> mp_limb_t d1, mp_limb_t d0, mp_limb_t dinv)
> {
> mp_limb_t q1, q0, r1, t1, t0, mask;
>
> umul_ppmm (q1, q0, n2, dinv);
> add_ssaaaa (q1, q0, q1, q0, n2, n1);
>
> q1++; /* Can't overflow */
> umul_ppmm (t1, t0, q1, d0); /* t0 isn't used */
> r1 = n1 - d1 * q1 - t1;
To be able to prove correctness, I'm afraid this needs to be
r1 = n1 - d1 * q1 - t1 - (t0 > 0);
> mask = - (mp_limb_t) (r1 >= q0);
> q1 += mask;
> r1 += mask & (d1 + 1);
> q1 += (r1 >= d1 - 1);
>
> return q1;
> }
I'm attaching a draft paper with the analysis. As for performance, I've
measurered a few percent speedup on shell.
Regards,
/Niels
-------------- next part --------------
A non-text attachment was scrubbed...
Name: schoolbook-divappr.pdf
Type: application/pdf
Size: 202387 bytes
Desc: not available
URL: <https://gmplib.org/list-archives/gmp-devel/attachments/20180805/6475fbdb/attachment-0001.pdf>
-------------- next part --------------
--
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel
mailing list