udiv_qr_3by2 vs divappr

Niels Möller nisse at lysator.liu.se
Wed May 2 14:34:36 UTC 2018

tg at gmplib.org (Torbjörn Granlund) writes:

> nisse at lysator.liu.se (Niels Möller) writes:
>   Yes. By one if divappr_2 can produce a useful single-limb
>   remainder, otherwise by two.
> I am very curious about the result of this work.  A larger submul_1
> tripcount is not for free, but the current 3/2 quotient approximation is
> also not for free.

The main case of the current schoolbook iteration looks like

	  udiv_qr_3by2 (q, n1, n0, n1, np[1], np[0], d1, d0, dinv);

	  cy = mpn_submul_1 (np - dn, dp, dn, q);

	  cy1 = n0 < cy;
	  n0 = (n0 - cy) & GMP_NUMB_MASK;
	  cy = n1 < cy1;
	  n1 = (n1 - cy1) & GMP_NUMB_MASK;
	  np[0] = n0;

	  if (UNLIKELY (cy != 0))

So I would hope that increased submul_1 tripcount would be reasonably
cheap compared to these n0 and n1 updates, even if submul_1 might redo
one or two multiplications already done inside the quotient

> I suspect that significant speedup would come from early, speculative
> quotient approximation.  We do its equivalent for some asm redc_1 and
> its sibling mpn_sbpi1_bdiv_r (see e.g. mpn/x86_64/zen/sbpi1_bdiv_r.asm).
> This was a significant speedup.

I'll try to keep that in mind, but it's going to be a bit tricky.


Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.

More information about the gmp-devel mailing list