udiv_qr_3by2 vs divappr
Torbjörn Granlund
tg at gmplib.org
Thu Sep 20 10:50:39 UTC 2018
paul zimmermann <Paul.Zimmermann at inria.fr> writes:
1) those with only O(1) preprocessing time, like the precomputation of an
inverse of the 1 or 2 uppers limbs of the divisor
2) those with O(n) preprocessing time, like Svoboda division
And (2) is perhaps OK earlier since it saves O(n) time in the main
quotient limb development loop.
I suppose a "naive" implementation scales the full dividend and the full
divisor by a factor which fits in a limb. One could in theory save some
time by scaling just their O(log(Q)) most significant limb.
> Are you thinking of Svoboda division here? That's another possibility
> for speeding up schoolbook division.
yes, in Svoboda division the quotient selection is for free. But you still
have to wait the end of the previous submul_1 (or submul_2) call to access
the next quotient, whereas in mpn_mul_basecase all multipliers are known in
advance. Does this makes a difference?
How likely does Svoboda overesimate a quotient limb?
The trick I'm looking into now does not fully serialise consecutive
submul_1 calls except in the unlikely case that a quotient limb is
overestimated. It will be overestimated more often than in our present
code, but I haven't analysed how much more often. It needs to be quite
rare for the algorithm to be good. To keep it low, one could move
slightly more work for the initial submul_1, not just 2 as in my example
code.
Note that a real implementation should be able to use the intermediate
results of udiv_qr_3by2 (or divappr) to make the initial submul_1 very
simple; I assume just one or two extra mul + some sub instructions will
be needed.
--
Torbjörn
Please encrypt, key id 0xC8601622
More information about the gmp-devel
mailing list