#### 15.3.2 Lehmer’s algorithm

Lehmer’s improvement of the Euclidean algorithms is based on the observation
that the initial part of the quotient sequence depends only on the most
significant parts of the inputs. The variant of Lehmer’s algorithm used in GMP
splits off the most significant two limbs, as suggested, e.g., in “A
Double-Digit Lehmer-Euclid Algorithm” by Jebelean (see References). The
quotients of two double-limb inputs are collected as a 2 by 2 matrix with
single-limb elements. This is done by the function `mpn_hgcd2`

. The
resulting matrix is applied to the inputs using `mpn_mul_1`

and
`mpn_submul_1`

. Each iteration usually reduces the inputs by almost one
limb. In the rare case of a large quotient, no progress can be made by
examining just the most significant two limbs, and the quotient is computed
using plain division.

The resulting algorithm is asymptotically *O(N^2)*, just as the Euclidean
algorithm and the binary algorithm. The quadratic part of the work are
the calls to `mpn_mul_1`

and `mpn_submul_1`

. For small sizes, the
linear work is also significant. There are roughly *N* calls to the
`mpn_hgcd2`

function. This function uses a couple of important
optimizations:

- It uses the same relaxed notion of correctness as
`mpn_hgcd`

(see next
section). This means that when called with the most significant two limbs of
two large numbers, the returned matrix does not always correspond exactly to
the initial quotient sequence for the two large numbers; the final quotient
may sometimes be one off.
- It takes advantage of the fact that the quotients are usually small. The division
operator is not used, since the corresponding assembler instruction is very
slow on most architectures. (This code could probably be improved further, it
uses many branches that are unfriendly to prediction.)
- It switches from double-limb calculations to single-limb calculations half-way
through, when the input numbers have been reduced in size from two limbs to
one and a half.