Update: Schönhage 's algorithm for fast integer GCD
Niels Möller
nisse at lysator.liu.se
Mon Sep 15 22:34:59 CEST 2003
nisse at lysator.liu.se (Niels Möller) writes:
> I've been looking at how to get a fast gcd algorithm. The one best
> known seems to be Schönhage's algorithm, first described in an article
> in Acta Informatica 1971, in German.
Ok, now I have rewritten the code yet one more time, and I've also
done some more analysis of the sizes to expect and the number of
recursive calls. The current code is still memory hungry (linear in
the size of the input, with a fairly large constant), but I think that
it can be cut down to reasonable sizes. The new code is a little
faster than the previous version, even if it hasn't undergone careful
tuning yet (the basecase thresholds are at 20 limbs, compared to 128
when benchmarking the previous version).
Timings, relative to gmp-4.1.2 mpz_gcd, on sparc, are included below.
The new code it wins over gmp-4.1.2 at around 1500 limbs on fibonacci
numbers, and at around 500 limbs for "random" numbers. For comparison,
what are typical thresholds for FFT multiplication?
$ ./nhgcd-bench # HGCD_LEHMER_THRESHOLD = 20, FGCD_LEHMER_THRESHOLD = 20
Gcd of fibonacci numbers.
limbs gcd fgcd gcd/fgcd
1 12.52us 18.11us 0.69 *
1 12.64us 17.40us 0.73 *
1 13.44us 18.36us 0.73 *
1 14.10us 18.52us 0.76 *
2 29.52us 35.40us 0.83 *
2 27.84us 135.69us 0.21 *
3 68.17us 250.00us 0.27 *
4 102.99us 392.16us 0.26 *
6 131.41us 621.12us 0.21 *
8 204.50us 970.87us 0.21 *
12 326.80us 1.47ms 0.22 *
18 512.82us 2.33ms 0.22 *
27 909.09us 5.26ms 0.17 *
41 1.54ms 7.69ms 0.20 *
61 2.86ms 14.29ms 0.20 *
91 5.56ms 25.00ms 0.22 *
136 11.11ms 43.33ms 0.26 *
204 24.00ms 70.00ms 0.34 *
305 55.00ms 120.00ms 0.46 *
458 120.00ms 210.00ms 0.57 *
686 240.00ms 380.00ms 0.63 *
1029 530.00ms 680.00ms 0.78 *
1544 1.26s 1.21s 1.04 *
2316 3.09s 2.17s 1.42 *
3473 7.20s 3.93s 1.83 *
5209 16.52s 7.12s 2.32 *
7814 37.89s 12.93s 2.93 *
Gcd of pseudorandom numbers.
limbs gcd fgcd gcd/fgcd
2 25.25us 98.33us 0.26 *
2 41.14us 113.25us 0.36 *
2 38.62us 129.37us 0.30 *
3 73.86us 214.13us 0.34 *
4 94.52us 280.11us 0.34 *
6 134.77us 300.30us 0.45 *
8 182.15us 478.47us 0.38 *
11 267.38us 581.40us 0.46 *
16 381.68us 934.58us 0.41 *
24 636.94us 1.92ms 0.33 *
35 1.15ms 3.57ms 0.32 *
52 2.00ms 6.25ms 0.32 *
78 3.85ms 11.11ms 0.35 *
117 8.33ms 18.33ms 0.45 *
175 16.67ms 33.33ms 0.50 *
261 40.00ms 55.00ms 0.73 *
391 80.00ms 120.00ms 0.67 *
587 190.00ms 180.00ms 1.06 *
879 420.00ms 330.00ms 1.27 *
1318 940.00ms 590.00ms 1.59 *
1977 2.28s 1.09s 2.09 *
2965 5.44s 2.03s 2.68 *
4447 12.68s 3.60s 3.52 *
6670 29.33s 6.57s 4.46 *
10005 65.41s 12.18s 5.37 *
gprof top-ten for the above parameters
% cumulative self self total
time seconds seconds calls ms/call ms/call name
35.86 7.37 7.37 1367273 0.01 0.01 div2
14.65 10.38 3.01 41397 0.07 0.32 nhgcd2
11.63 12.77 2.39 1229933 0.00 0.00 nhgcd_quotient_stack_push_1
4.33 13.66 0.89 73967 0.01 0.01 nhgcd_update_uv
4.04 14.49 0.83 26016 0.03 0.04 nhgcd_fix
3.16 15.14 0.65 13150 0.05 0.05 nhgcd_start
2.68 15.69 0.55 82076 0.01 0.01 nhgcd2_fix
2.48 16.20 0.51 6671 0.08 2.41 nhgcd_lehmer
1.95 16.60 0.40 41038 0.01 0.01 compute_vplus
1.95 17.00 0.40 38425 0.01 0.01 nhgcd2_mul
Most of the time is spent in computations on two-limb numbers (that's
div2 and nhgcd2) and book-keeping of one-limb quotients
(nhgcd_quotient_stack_push_1), and the number of such operations is
*linear* in the size of the input. nhgcd_update_uv and nhgcd_fix are
the top-most bignum operations on the list.
Regards,
/Niels
More information about the gmp-devel
mailing list