A so-called exact division is when the dividend is known to be an exact multiple of the divisor. Jebelean’s exact division algorithm uses this knowledge to make some significant optimizations (see References).
The idea can be illustrated in decimal for example with 368154 divided by 543. Because the low digit of the dividend is 4, the low digit of the quotient must be 8. This is arrived at from 4*7 mod 10, using the fact 7 is the modular inverse of 3 (the low digit of the divisor), since 3*7 ≡ 1 mod 10. So 8*543=4344 can be subtracted from the dividend leaving 363810. Notice the low digit has become zero.
The procedure is repeated at the second digit, with the next quotient digit 7 (7 ≡ 1*7 mod 10), subtracting 7*543=3801, leaving 325800. And finally at the third digit with quotient digit 6 (8*7 mod 10), subtracting 6*543=3258 leaving 0. So the quotient is 678.
Notice however that the multiplies and subtractions don’t need to extend past the low three digits of the dividend, since that’s enough to determine the three quotient digits. For the last quotient digit no subtraction is needed at all. On a 2NxN division like this one, only about half the work of a normal basecase division is necessary.
For an NxM exact division producing Q=N-M quotient limbs, the saving over a normal basecase division is in two parts. Firstly, each of the Q quotient limbs needs only one multiply, not a 2x1 divide and multiply. Secondly, the crossproducts are reduced when Q>M to Q*M-M*(M+1)/2, or when Q<=M to Q*(Q-1)/2. Notice the savings are complementary. If Q is big then many divisions are saved, or if Q is small then the crossproducts reduce to a small number.
The modular inverse used is calculated efficiently by
gmp-impl.h. This does four multiplies for a 32-bit limb, or six for a
64-bit limb. tune/modlinv.c has some alternate implementations that
might suit processors better at bit twiddling than multiplying.
The sub-quadratic exact division described by Jebelean in “Exact Division with Karatsuba Complexity” is not currently implemented. It uses a rearrangement similar to the divide and conquer for normal division (see Divide and Conquer Division), but operating from low to high. A further possibility not currently implemented is “Bidirectional Exact Integer Division” by Krandick and Jebelean which forms quotient limbs from both the high and low ends of the dividend, and can halve once more the number of crossproducts needed in a 2NxN division.
A special case exact division by 3 exists in
supporting Toom-3 multiplication and
mpq canonicalizations. It forms
quotient digits with a multiply by the modular inverse of 3 (which is
0xAA..AAB) and uses two comparisons to determine a borrow for the next
limb. The multiplications don’t need to be on the dependent chain, as long as
the effect of the borrows is applied, which can help chips with pipelined