udiv_qr_3by2 vs divappr

Niels Möller nisse at lysator.liu.se
Sun Aug 5 07:59:46 UTC 2018


nisse at lysator.liu.se (Niels Möller) writes:

> static inline mp_limb_t
> divappr_2 (mp_limb_t n2, mp_limb_t n1,
> 	   mp_limb_t d1, mp_limb_t d0, mp_limb_t dinv)
> {
>   mp_limb_t q1, q0, r1, t1, t0, mask;
>
>   umul_ppmm (q1, q0, n2, dinv);
>   add_ssaaaa (q1, q0, q1, q0, n2, n1);
>
>   q1++;  /* Can't overflow */
>   umul_ppmm (t1, t0, q1, d0);  /* t0 isn't used */
>   r1 = n1 - d1 * q1 - t1;

To be able to prove correctness, I'm afraid this needs to be

  r1 = n1 - d1 * q1 - t1 - (t0 > 0);

>   mask = - (mp_limb_t) (r1 >= q0);
>   q1 += mask;
>   r1 += mask & (d1 + 1);
>   q1 += (r1 >= d1 - 1);
>
>   return q1;
> }

I'm attaching a draft paper with the analysis. As for performance, I've
measurered a few percent speedup on shell.

Regards,
/Niels

-------------- next part --------------
A non-text attachment was scrubbed...
Name: schoolbook-divappr.pdf
Type: application/pdf
Size: 202387 bytes
Desc: not available
URL: <https://gmplib.org/list-archives/gmp-devel/attachments/20180805/6475fbdb/attachment-0001.pdf>
-------------- next part --------------

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.


More information about the gmp-devel mailing list