udiv_qr_3by2 vs divappr

Torbjörn Granlund tg at gmplib.org
Tue Sep 4 21:41:54 UTC 2018

Why is (the obsolete function) mpn_tdiv_qr's interface relevant
here?  You time just the lower-level sbpi1 functions, right?

It might be clean to have divappr_2 as a separate function, but perhaps
expanding its code in the mpn_sbpi1_div_qr loop would expose the
possibility for decreasing the submul_1 size argument.  If the measured
speedup is less than you expected, perhaps the old code's dn-2 size
argument explains some of it?

I believe we could find CPUs (mainly low-end and obsolete hig-end ones)
where the old code will beat the new code because of the old code's
lower submul_1 size argument.

Micrpoptimisation: Replace q1++ with n2+1 in add_ssaaaa.

Could the most significant limb of the partial remainder be kept in a
scalar between iterations?

Please encrypt, key id 0xC8601622

More information about the gmp-devel mailing list