The extended GCD function, or GCDEXT, calculates *gcd(a,b)* and also
cofactors *x* and *y* satisfying *a*x+b*y=gcd(a,b)*. All the algorithms used for plain GCD are extended to
handle this case. The binary algorithm is used only for single-limb GCDEXT.
Lehmer’s algorithm is used for sizes up to `GCDEXT_DC_THRESHOLD`

. Above
this threshold, GCDEXT is implemented as a loop around HGCD, but with more
book-keeping to keep track of the cofactors. This gives the same asymptotic
running time as for GCD and HGCD, *O(M(N)*log(N))*.

One difference to plain GCD is that while the inputs *a* and *b* are
reduced as the algorithm proceeds, the cofactors *x* and *y* grow in
size. This makes the tuning of the chopping-point more difficult. The current
code chops off the most significant half of the inputs for the call to HGCD in
the first iteration, and the most significant two thirds for the remaining
calls. This strategy could surely be improved. Also the stop condition for the
loop, where Lehmer’s algorithm is invoked once the inputs are reduced below
`GCDEXT_DC_THRESHOLD`

, could maybe be improved by taking into account the
current size of the cofactors.