# gmp-devel Digest, Vol 1, Issue 89

Niels Möller nisse at lysator.liu.se
Wed Sep 17 13:04:16 CEST 2003

```Paul.Zimmermann at loria.fr (Paul Zimmermann) writes:

> With a colleague, we are working on a binary variant of Scho"nhage's
> algorithm that works in the general case. First results are encouraging
> (a threshold of about 6500 limbs with mpz_gcd, and about 1800 limbs with
> mpz_gcdext, with mpz-level implementation). The nice thing is that no
> correction is needed: the "quotient" are always exact.

Can you explain how that works? As far as I understand, to do an
"exact" Schönhage-like gcd, one needs a function which given A, B
computes

a = u A + v B     (*)
b = u'A + v'B

such that

1. a, b, u, v, u', v' are approximately of half the size of A
2. gcd(a,b) = gcd(A, B)

I use Jebelean's criterion for partial quotient sequence matches to
guarantee (2), and it's not obvious to me how to do that with a binary
base case. (And then things are complicated further, because from
Jebelean I get additional requirements saying that a must not be too
small, and I need to introduce "backup" steps to ensure that).

A non-exact variant could work like this: Given A, B, compute one
linear combination r = u A + v B with the only requirement that r is
about half the size of A. Then use

gcd(a, b) = gcd(gcd(a,r), gcd(b, r)) = gcd(a mod r, b mod r, r)
= gcd(a mod r, b mod r) / d

where we hope that spurious factor d is reasonably small and can be
discarded at the end of the gcd algorithm.

> Niels, in your gcd-0.4 code, you have a comment somewhere saying that
> you may keep only the first column of the 2x2 cofactor matrix (i.e. u0 and
> u1). Can you do that in the recursive part of Scho"nhage's algorithm?

I need the full matrices, at least sometimes. But it possible to
compute the right half more lazily (given A, B, a, b, u and u' in (*),
one can compute v0, v1), but I'm not sure if that saves any real
computation time. The interesting step is that we have the matrix
multiplication

S = R R'

Say we're given only the left hand halves of the matrices R and R',
and want to compute the left hand half of S. We can do that by first
recovering the right hand half of R (from the left hand half and other
information we have around), and then apply the full R matrix to the
left half of R',

S1 = [R1, R2] R1'

where S1, R1, R2, R1' are column vectors.

And if it turns out that the caller needs the full matrix S, what is
cheaper? To return only the left half of S and let the caller
reconstruct the rest, or to reconstruct the full R and R' matrices and
return the full product?

I haven't explored the lazy approach, but my current code keeps the
original values of A, B around, which should make this trick more
practical than it was with the earlier version.

Regards,
/Niels
```