udiv_qr_3by2 vs divappr

Torbjörn Granlund tg at gmplib.org
Wed May 2 13:01:17 UTC 2018

nisse at lysator.liu.se (Niels Möller) writes:

  Yes. By one if divappr_2 can produce a useful single-limb
  remainder, otherwise by two.

I am very curious about the result of this work.  A larger submul_1
tripcount is not for free, but the current 3/2 quotient approximation is
also not for free.

I suspect that significant speedup would come from early, speculative
quotient approximation.  We do its equivalent for some asm redc_1 and
its sibling mpn_sbpi1_bdiv_r (see e.g. mpn/x86_64/zen/sbpi1_bdiv_r.asm).
This was a significant speedup.

Of course, for bdiv the quotient is not approximated and therefore the
next one is not speculative.  That makes bdiv easier to write and to get
right, but I think the speedup potential is similar.

Please encrypt, key id 0xC8601622

More information about the gmp-devel mailing list