15.2.3 Divide and Conquer Division

For divisors larger than DC_DIV_QR_THRESHOLD, division is done by dividing. Or to be precise by a recursive divide and conquer algorithm based on work by Moenck and Borodin, Jebelean, and Burnikel and Ziegler (see References).

The algorithm consists essentially of recognising that a 2NxN division can be done with the basecase division algorithm (see Basecase Division), but using N/2 limbs as a base, not just a single limb. This way the multiplications that arise are (N/2)x(N/2) and can take advantage of Karatsuba and higher multiplication algorithms (see Multiplication). The two “digits” of the quotient are formed by recursive Nx(N/2) divisions.

If the (N/2)x(N/2) multiplies are done with a basecase multiplication then the work is about the same as a basecase division, but with more function call overheads and with some subtractions separated from the multiplies. These overheads mean that it’s only when N/2 is above MUL_TOOM22_THRESHOLD that divide and conquer is of use.

DC_DIV_QR_THRESHOLD is based on the divisor size N, so it will be somewhere above twice MUL_TOOM22_THRESHOLD, but how much above depends on the CPU. An optimized mpn_mul_basecase can lower DC_DIV_QR_THRESHOLD a little by offering a ready-made advantage over repeated mpn_submul_1 calls.

Divide and conquer is asymptotically O(M(N)*log(N)) where M(N) is the time for an NxN multiplication done with FFTs. The actual time is a sum over multiplications of the recursed sizes, as can be seen near the end of section 2.2 of Burnikel and Ziegler. For example, within the Toom-3 range, divide and conquer is 2.63*M(N). With higher algorithms the M(N) term improves and the multiplier tends to log(N). In practice, at moderate to large sizes, a 2NxN division is about 2 to 4 times slower than an NxN multiplication.