# fast gcd

Paul Zimmermann Paul.Zimmermann at loria.fr
Tue Oct 13 18:58:57 CEST 2009

```       Niels,

> One question I've had is what's best for medium size numbers (when we
> don't want to use quadratic gcd, but also are not in the range where
> FFT is used for the multiplications in the subquadratic algorithm). Is
> it good to use the (b) variant all the way, or should one return the
> corresponding remainders when input size is below some threshold?

since the (b) variant has some overhead, my guess is that for medium sizes
it would be best to return the corresponding remainders. (I have to say I did
not fully optimize the bgcd code for medium sizes.)

> What's needed to gain from this is a primitive that computes a b - c d
> when the result is known to be non-negative and of a given size, with
> significant cancellation at the most significant words. This primitive
> can then use mullo, or fft wraparound, or any other useful tricks
> depending on the input size. Better algorithms for multiplication mod
> 2^n ± 1 for sizes below the fft range (as was discussed a few days
> ago) would also help.

indeed. for the 2-adic gcd I wrote a similar routine which computes
a b + c d when the j low bits of that product are known (to be 0).

> One other questions: In the outer gcd loop, how large fraction do you
> operate on with hgcd in each iteration? In current gmp, the fraction
> for gcd is 1/3, which seemed close to optimal when I benchmarked it.

I also noticed experimentally that 1/3 was close to optimal, i.e.,
in the outer gcd loop, reducing the numbers by 1/3 each time. However
I have no theoretical justification of that.

> For gcdext, its a more difficult to get this right. The problem is
> that during the loop, the remainders naturally shrink, but the
> cofactors grow. To take cost of updating the cofactors into account,
> the optimal fraction is most likely a function both on the current
> remainder size and the current cofactor size. Current method for
> selecting the cutting point for each outer iteration of gcdext can
> surely be improved.

I did not compare to gcdext, since bgcdext returns different cofactors.

> (If anybody wants to see the full gcd todo file, let me know).

I am interested.

Paul
```