The Karatsuba multiplication algorithm is described in Knuth section 4.3.3 part A, and various other textbooks. A brief description is given here.

The inputs *x* and *y* are treated as each split into two parts of
equal length (or the most significant part one limb shorter if N is odd).

high low +----------+----------+ | x1 | x0 | +----------+----------+ +----------+----------+ | y1 | y0 | +----------+----------+

Let *b* be the power of 2 where the split occurs, i.e. if x0 is
*k* limbs (y0 the same) then
*b=2^(k*mp_bits_per_limb)*.
With that *x=x1*b+x0* and *y=y1*b+y0*, and the
following holds,

x*y = (b^2+b)*x1*y1 - b*(x1-x0)*(y1-y0) + (b+1)*x0*y0

This formula means doing only three multiplies of (N/2)x(N/2) limbs,
whereas a basecase multiply of NxN limbs is equivalent to four
multiplies of (N/2)x(N/2). The factors *(b^2+b)* etc represent
the positions where the three products must be added.

high low +--------+--------+ +--------+--------+ | x1*y1 | | x0*y0 | +--------+--------+ +--------+--------+ +--------+--------+ add | x1*y1 | +--------+--------+ +--------+--------+ add | x0*y0 | +--------+--------+ +--------+--------+ sub | (x1-x0)*(y1-y0) | +--------+--------+

The term *(x1-x0)*(y1-y0)* is best calculated as an
absolute value, and the sign used to choose to add or subtract. Notice the
sum *high(x0*y0)+low(x1*y1)* occurs twice, so it’s possible to do *5*k* limb
additions, rather than *6*k*, but in GMP extra function call overheads
outweigh the saving.

Squaring is similar to multiplying, but with *x=y* the formula reduces to
an equivalent with three squares,

x^2 = (b^2+b)*x1^2 - b*(x1-x0)^2 + (b+1)*x0^2

The final result is accumulated from those three squares the same way as for
the three multiplies above. The middle term *(x1-x0)^2* is now
always positive.

A similar formula for both multiplying and squaring can be constructed with a
middle term *(x1+x0)*(y1+y0)*. But those sums can exceed
*k* limbs, leading to more carry handling and additions than the form
above.

Karatsuba multiplication is asymptotically an *O(N^1.585)* algorithm,
the exponent being *log(3)/log(2)*, representing 3 multiplies
each *1/2* the size of the inputs. This is a big improvement over the
basecase multiply at *O(N^2)* and the advantage soon overcomes the extra
additions Karatsuba performs. `MUL_TOOM22_THRESHOLD`

can be as little
as 10 limbs. The `SQR`

threshold is usually about twice the `MUL`

.

The basecase algorithm will take a time of the form *M(N) = a*N^2 + b*N + c* and the Karatsuba algorithm *K(N) = 3*M(N/2) + d*N + e*, which expands to *K(N) = 3/4*a*N^2 + 3/2*b*N + 3*c + d*N + e*. The
factor *3/4* for *a* means per-crossproduct speedups in the
basecase code will increase the threshold since they benefit *M(N)* more
than *K(N)*. And conversely the *3/2* for *b* means
linear style speedups of *b* will increase the threshold since they
benefit *K(N)* more than *M(N)*. The latter can be seen for
instance when adding an optimized `mpn_sqr_diagonal`

to
`mpn_sqr_basecase`

. Of course all speedups reduce total time, and in
that sense the algorithm thresholds are merely of academic interest.