udiv_qr_3by2 vs divappr
tg at gmplib.org
Tue Sep 4 21:41:54 UTC 2018
Why is (the obsolete function) mpn_tdiv_qr's interface relevant
here? You time just the lower-level sbpi1 functions, right?
It might be clean to have divappr_2 as a separate function, but perhaps
expanding its code in the mpn_sbpi1_div_qr loop would expose the
possibility for decreasing the submul_1 size argument. If the measured
speedup is less than you expected, perhaps the old code's dn-2 size
argument explains some of it?
I believe we could find CPUs (mainly low-end and obsolete hig-end ones)
where the old code will beat the new code because of the old code's
lower submul_1 size argument.
Micrpoptimisation: Replace q1++ with n2+1 in add_ssaaaa.
Could the most significant limb of the partial remainder be kept in a
scalar between iterations?
Please encrypt, key id 0xC8601622
More information about the gmp-devel