Update: Schönhage 's algorithm for fast integer GCD

Mon Sep 15 22:34:59 CEST 2003

nisse at lysator.liu.se (Niels Möller) writes:

> I've been looking at how to get a fast gcd algorithm. The one best
> known seems to be Schönhage's algorithm, first described in an article
> in Acta Informatica 1971, in German.

Ok, now I have rewritten the code yet one more time, and I've also
done some more analysis of the sizes to expect and the number of
recursive calls. The current code is still memory hungry (linear in
the size of the input, with a fairly large constant), but I think that
it can be cut down to reasonable sizes. The new code is a little
faster than the previous version, even if it hasn't undergone careful
tuning yet (the basecase thresholds are at 20 limbs, compared to 128
when benchmarking the previous version).

Timings, relative to gmp-4.1.2 mpz_gcd, on sparc, are included below.

The new code it wins over gmp-4.1.2 at around 1500 limbs on fibonacci
numbers, and at around 500 limbs for "random" numbers. For comparison,
what are typical thresholds for FFT multiplication?

$ ./nhgcd-bench # HGCD_LEHMER_THRESHOLD = 20, FGCD_LEHMER_THRESHOLD = 20
Gcd of fibonacci numbers.

limbs    gcd     fgcd    gcd/fgcd
    1  12.52us  18.11us  0.69             *
    1  12.64us  17.40us  0.73             *
    1  13.44us  18.36us  0.73             *
    1  14.10us  18.52us  0.76             *
    2  29.52us  35.40us  0.83              *
    2  27.84us 135.69us  0.21        *
    3  68.17us 250.00us  0.27         *
    4 102.99us 392.16us  0.26         *
    6 131.41us 621.12us  0.21        *
    8 204.50us 970.87us  0.21        *
   12 326.80us   1.47ms  0.22        *
   18 512.82us   2.33ms  0.22        *
   27 909.09us   5.26ms  0.17       *
   41   1.54ms   7.69ms  0.20        *
   61   2.86ms  14.29ms  0.20        *
   91   5.56ms  25.00ms  0.22        *
  136  11.11ms  43.33ms  0.26         *
  204  24.00ms  70.00ms  0.34          *
  305  55.00ms 120.00ms  0.46           *
  458 120.00ms 210.00ms  0.57            *
  686 240.00ms 380.00ms  0.63             *
 1029 530.00ms 680.00ms  0.78             *
 1544   1.26s    1.21s   1.04               *
 2316   3.09s    2.17s   1.42                *
 3473   7.20s    3.93s   1.83                 *
 5209  16.52s    7.12s   2.32                  *
 7814  37.89s   12.93s   2.93                   *
Gcd of pseudorandom numbers.

limbs    gcd     fgcd    gcd/fgcd
    2  25.25us  98.33us  0.26         *
    2  41.14us 113.25us  0.36          *
    2  38.62us 129.37us  0.30         *
    3  73.86us 214.13us  0.34          *
    4  94.52us 280.11us  0.34          *
    6 134.77us 300.30us  0.45           *
    8 182.15us 478.47us  0.38          *
   11 267.38us 581.40us  0.46           *
   16 381.68us 934.58us  0.41           *
   24 636.94us   1.92ms  0.33          *
   35   1.15ms   3.57ms  0.32          *
   52   2.00ms   6.25ms  0.32          *
   78   3.85ms  11.11ms  0.35          *
  117   8.33ms  18.33ms  0.45           *
  175  16.67ms  33.33ms  0.50            *
  261  40.00ms  55.00ms  0.73             *
  391  80.00ms 120.00ms  0.67             *
  587 190.00ms 180.00ms  1.06               *
  879 420.00ms 330.00ms  1.27                *
 1318 940.00ms 590.00ms  1.59                 *
 1977   2.28s    1.09s   2.09                  *
 2965   5.44s    2.03s   2.68                   *
 4447  12.68s    3.60s   3.52                    *
 6670  29.33s    6.57s   4.46                     *
10005  65.41s   12.18s   5.37                      *

gprof top-ten for the above parameters

  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 35.86      7.37     7.37  1367273     0.01     0.01  div2
 14.65     10.38     3.01    41397     0.07     0.32  nhgcd2
 11.63     12.77     2.39  1229933     0.00     0.00  nhgcd_quotient_stack_push_1
  4.33     13.66     0.89    73967     0.01     0.01  nhgcd_update_uv
  4.04     14.49     0.83    26016     0.03     0.04  nhgcd_fix
  3.16     15.14     0.65    13150     0.05     0.05  nhgcd_start
  2.68     15.69     0.55    82076     0.01     0.01  nhgcd2_fix
  2.48     16.20     0.51     6671     0.08     2.41  nhgcd_lehmer
  1.95     16.60     0.40    41038     0.01     0.01  compute_vplus
  1.95     17.00     0.40    38425     0.01     0.01  nhgcd2_mul

Most of the time is spent in computations on two-limb numbers (that's
div2 and nhgcd2) and book-keeping of one-limb quotients
(nhgcd_quotient_stack_push_1), and the number of such operations is
*linear* in the size of the input. nhgcd_update_uv and nhgcd_fix are
the top-most bignum operations on the list.

Regards,
/Niels