udiv_qr_3by2 vs divappr
nisse at lysator.liu.se
Wed May 2 14:34:36 UTC 2018
tg at gmplib.org (Torbjörn Granlund) writes:
> nisse at lysator.liu.se (Niels Möller) writes:
> Yes. By one if divappr_2 can produce a useful single-limb
> remainder, otherwise by two.
> I am very curious about the result of this work. A larger submul_1
> tripcount is not for free, but the current 3/2 quotient approximation is
> also not for free.
The main case of the current schoolbook iteration looks like
udiv_qr_3by2 (q, n1, n0, n1, np, np, d1, d0, dinv);
cy = mpn_submul_1 (np - dn, dp, dn, q);
cy1 = n0 < cy;
n0 = (n0 - cy) & GMP_NUMB_MASK;
cy = n1 < cy1;
n1 = (n1 - cy1) & GMP_NUMB_MASK;
np = n0;
if (UNLIKELY (cy != 0))
So I would hope that increased submul_1 tripcount would be reasonably
cheap compared to these n0 and n1 updates, even if submul_1 might redo
one or two multiplications already done inside the quotient
> I suspect that significant speedup would come from early, speculative
> quotient approximation. We do its equivalent for some asm redc_1 and
> its sibling mpn_sbpi1_bdiv_r (see e.g. mpn/x86_64/zen/sbpi1_bdiv_r.asm).
> This was a significant speedup.
I'll try to keep that in mind, but it's going to be a bit tricky.
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel