Schönhage's algorithm for fast integer GCD
Niels Möller
nisse@lysator.liu.se
03 Jul 2003 14:38:27 +0200
Now I've rewritten the code to uses mpn. It seems to works, and it's
quite clear that it's asymptotically faster. I hope it is possible to
increase the speed at least by a factor of two by optimizing two or
three well chosen functions. Below is a benchmark of the current code.
It was run with thresholds in the range 32-64 limbs, running on x86,
and using fibonacci numbers (which are worst case for Euclid's
algorithm with maximum number of euclid steps, and all quotients = 1).
$ ./bench
Gcd of fibonacci numbers.
limbs gcd fgcd gcd/fgcd
1 0.71us 0.82us 0.87 *
1 0.68us 0.81us 0.84 *
1 0.91us 0.99us 0.92 *
1 0.76us 0.93us 0.82 *
2 1.02us 1.05us 0.97 *
2 1.28us 1.39us 0.92 *
3 2.06us 2.00us 1.03 *
4 3.81us 3.81us 1.00 *
6 7.16us 7.26us 0.99 *
8 7.97us 8.98us 0.89 *
12 14.99us 13.66us 1.10 *
18 20.85us 21.49us 0.97 *
27 35.24us 34.20us 1.03 *
41 59.59us 59.00us 1.01 *
61 97.75us 96.71us 1.01 *
91 186.22us 591.72us 0.31 *
136 321.54us 1.35ms 0.24 *
204 584.80us 2.00ms 0.29 *
305 1.11ms 4.00ms 0.28 *
458 2.22ms 5.88ms 0.38 *
686 4.00ms 11.11ms 0.36 *
1029 8.33ms 20.00ms 0.42 *
1544 18.33ms 33.33ms 0.55 *
2316 36.67ms 60.00ms 0.61 *
3473 90.00ms 100.00ms 0.90 *
5209 190.00ms 180.00ms 1.06 *
7814 420.00ms 300.00ms 1.40 *
11720 950.00ms 530.00ms 1.79 *
17580 2.12s 950.00ms 2.23 *
26371 4.80s 1.67s 2.87 *
39556 10.82s 2.98s 3.63 *
59333 28.40s 5.16s 5.50 *
gprof output:
43.37 0.85 0.85 195586 0.00 0.00 hgcd_2
13.52 1.11 0.27 2038248 0.00 0.00 hgcd_update_1_ui
12.24 1.35 0.24 2075205 0.00 0.00 euclid_step
7.40 1.50 0.14 195586 0.00 0.00 hgcd_2_update
3.57 1.57 0.07 32 2.19 60.78 bench_fgcd
2.55 1.62 0.05 782344 0.00 0.00 hgcd_2_abs
[...]
: 0.00 1.96 0.00 32 0.00 0.00 display
Here, hgcd_2 is the work horse for Lehmer steps, and euclid_step
together with hgcd_update_1_ui is the work horse for euclid steps. Two
thirds of the cpu time is spent in those three functions.
The code is available at
http://www.lysator.liu.se/~nisse/misc/gcd-0.3.tar.gz. It needs some
more of both cleanup and optimization.
Regards,
/Niels