#### 15.2.3 Divide and Conquer Division

For divisors larger than `DC_DIV_QR_THRESHOLD`

, division is done by dividing.
Or to be precise by a recursive divide and conquer algorithm based on work by
Moenck and Borodin, Jebelean, and Burnikel and Ziegler (see References).

The algorithm consists essentially of recognising that a 2NxN division
can be done with the basecase division algorithm (see Basecase Division),
but using N/2 limbs as a base, not just a single limb. This way the
multiplications that arise are (N/2)x(N/2) and can take advantage of
Karatsuba and higher multiplication algorithms (see Multiplication). The two “digits” of the quotient are formed by recursive
Nx(N/2) divisions.

If the (N/2)x(N/2) multiplies are done with a basecase multiplication
then the work is about the same as a basecase division, but with more function
call overheads and with some subtractions separated from the multiplies.
These overheads mean that it’s only when N/2 is above
`MUL_TOOM22_THRESHOLD`

that divide and conquer is of use.

`DC_DIV_QR_THRESHOLD`

is based on the divisor size N, so it will be somewhere
above twice `MUL_TOOM22_THRESHOLD`

, but how much above depends on the
CPU. An optimized `mpn_mul_basecase`

can lower `DC_DIV_QR_THRESHOLD`

a
little by offering a ready-made advantage over repeated `mpn_submul_1`

calls.

Divide and conquer is asymptotically *O(M(N)*log(N))* where
*M(N)* is the time for an NxN multiplication done with FFTs. The
actual time is a sum over multiplications of the recursed sizes, as can be
seen near the end of section 2.2 of Burnikel and Ziegler. For example, within
the Toom-3 range, divide and conquer is *2.63*M(N)*. With higher
algorithms the *M(N)* term improves and the multiplier tends to *log(N)*. In practice, at moderate to large sizes, a 2NxN division
is about 2 to 4 times slower than an NxN multiplication.