fast gcd

Tue Oct 13 16:32:41 CEST 2009

       Hi,

since everybody on this list speaks currently only about multiplication,
let's speak about gcd.

Among the feedback we got on our book [1], someone asked us to provide a proof
of the HalfGcd algorithm. We thus will give such a proof in the next version,
for the binary/2-adic/LSB/right-to-left variant.

I also wrote an implementation of that algorithm in GMP, which is available
from the "Software" section of [1]. Here are some timings on a 2.83Ghz Core 2
with GMP 4.3.1 (N is the number of 64-bit limbs of the two random inputs):

   N          mpz_gcd        mpz_bgcd
100000         2544ms          3123ms
200000         6152ms          7170ms
500000        19367ms         21296ms
1000000       45690ms         48480ms
2000000      106876ms        112167ms
5000000      323279ms        333767ms
10000000     738496ms        725144ms

The differences are as follows:

(a) mpz_gcd uses the classical MSB division, while mpz_bgcd uses the 2-adic
    division, which leads to a size increase of about 2.5%, as analyzed by
    Daireaux, Maume-Deschamps and Vallée in [2].
(b) mpz_bgcd (or more exactly the inner hgcd routine) only computes the 2x2
    matrix, not the corresponding terms of the remainder sequence, which are
    recomputed if needed from that matrix
(c) for the matrix-vector multiplication in (b), mpz_bgcd uses the wrap-around
    trick together with FFT mod 2^N+1

Since (b) and (c) can be done in the classical case too, I believe the gcd code
in GMP can still be improved.

Paul Zimmermann

[1] Modern Computer Arithmetic, Richard Brent and Paul Zimmermann,
    version 0.3, June 2009, http://www.loria.fr/~zimmerma/mca/mca-0.3.pdf. 
[2] users.info.unicaen.fr/~brigitte/Publications/daireaux-maume-vallee-long.pdf