udiv_qr_3by2 vs divappr
tg at gmplib.org
Wed May 2 13:01:17 UTC 2018
nisse at lysator.liu.se (Niels Möller) writes:
Yes. By one if divappr_2 can produce a useful single-limb
remainder, otherwise by two.
I am very curious about the result of this work. A larger submul_1
tripcount is not for free, but the current 3/2 quotient approximation is
also not for free.
I suspect that significant speedup would come from early, speculative
quotient approximation. We do its equivalent for some asm redc_1 and
its sibling mpn_sbpi1_bdiv_r (see e.g. mpn/x86_64/zen/sbpi1_bdiv_r.asm).
This was a significant speedup.
Of course, for bdiv the quotient is not approximated and therefore the
next one is not speculative. That makes bdiv easier to write and to get
right, but I think the speedup potential is similar.
Please encrypt, key id 0xC8601622
More information about the gmp-devel