For divisors larger than DC_DIV_QR_THRESHOLD
, division is done by dividing.
Or to be precise by a recursive divide and conquer algorithm based on work by
Moenck and Borodin, Jebelean, and Burnikel and Ziegler (see References).
The algorithm consists essentially of recognising that a 2NxN division can be done with the basecase division algorithm (see Basecase Division), but using N/2 limbs as a base, not just a single limb. This way the multiplications that arise are (N/2)x(N/2) and can take advantage of Karatsuba and higher multiplication algorithms (see Multiplication). The two “digits” of the quotient are formed by recursive Nx(N/2) divisions.
If the (N/2)x(N/2) multiplies are done with a basecase multiplication
then the work is about the same as a basecase division, but with more function
call overheads and with some subtractions separated from the multiplies.
These overheads mean that it’s only when N/2 is above
MUL_TOOM22_THRESHOLD
that divide and conquer is of use.
DC_DIV_QR_THRESHOLD
is based on the divisor size N, so it will be somewhere
above twice MUL_TOOM22_THRESHOLD
, but how much above depends on the
CPU. An optimized mpn_mul_basecase
can lower DC_DIV_QR_THRESHOLD
a
little by offering a ready-made advantage over repeated mpn_submul_1
calls.
Divide and conquer is asymptotically O(M(N)*log(N)) where M(N) is the time for an NxN multiplication done with FFTs. The actual time is a sum over multiplications of the recursed sizes, as can be seen near the end of section 2.2 of Burnikel and Ziegler. For example, within the Toom-3 range, divide and conquer is 2.63*M(N). With higher algorithms the M(N) term improves and the multiplier tends to log(N). In practice, at moderate to large sizes, a 2NxN division is about 2 to 4 times slower than an NxN multiplication.