Revisiting gcd and gcdext for small operands

Torbjörn Granlund tg at gmplib.org
Fri Dec 1 13:25:22 UTC 2017


Nisse and other great common dividers!

Please consider these measurements:

ibr32$ ./tune/speed -p10000000 -c -s1-60 -f1.2 -r mpn_mul mpn_gcd mpn_gcdext
              mpn_mul       mpn_gcd    mpn_gcdext
1              #39.14        6.5512       17.1607
2              #50.77       12.4422       51.4993
3              #66.27       36.0216       57.9561
4              #87.17       40.7439       58.6398
5             #110.95       41.7779       57.3457
6             #136.56       42.4274       54.8383
7             #165.67       42.7125       51.9271
8             #205.14       39.7861       47.5393
9             #249.47       37.0144       42.8386
10            #289.77       36.0182       40.6772
12            #398.58       32.2387       34.2015
14            #516.54       28.4706       28.4355
16            #653.70       25.3546       25.7443
19            #887.40       21.4907       20.0516
22           #1196.18       17.9833       17.4368
26           #1617.71       15.1357       15.3959
31           #2196.36       12.3736       13.3466
37           #2963.45        9.1705       12.3230
44           #3970.82        9.4049       11.9397
52           #5309.94        7.8460       11.6024

ibr64# tune/speed -p10000000 -c -s1-60 -f1.2 -r mpn_mul mpn_gcd mpn_gcdext
              mpn_mul       mpn_gcd    mpn_gcdext
1              #34.51        9.5965       43.4851
2              #41.46       24.7985       91.1526
3              #63.37       60.3494       89.9912
4              #80.39       69.5938       92.7650
5             #105.76       69.8544       88.4005
6             #134.70       68.6191       83.3540
7             #173.62       64.7691       74.9246
8             #208.92       62.9363       72.0671
9             #262.49       56.4922       63.4748
10            #314.04       52.3732       59.2200
12            #437.50       46.1083       50.0230
14            #578.39       41.1333       42.6137
16            #714.52       38.2636       38.8416
19           #1021.12       32.3526       30.0309
22           #1274.16       29.6524       28.3373
26           #1653.53       26.4316       25.1063
31           #2283.67       22.3448       20.5088
37           #3185.51       16.2611       14.3919
44           #4003.10       16.8047       15.6203
52           #5287.91       12.4089       16.1354

Without detail knowledge of the current implementation, these numbers
suggest to me that we could speed small operand gcd and gcdext.  But I
might be wrong.

The relative slowness of gcd/gcdext vs mul is worse for 64-bit limbs
(2nd table) than for 32-bit limbs (1st table).

I assume that assembly basecase code using the "binary algorithm" would
help at least for n = 2 and up to some limit.  Such code exists now just
for n = 1, IIRC (and that code, mpn_gcd_1, allows one operand of
arbitrary size, for historical reasons).

Would asm "binary algorithm" code also make sense for, say, n = 10?  I
have some doubt, as C code around the current (asm) primitives add_n,
sub_n, rshift probably is close to optimal.  Sure, one could pwerhaps
use straight-line code for add_n/sub_n/rshift for some values of n, and
that could perhaps give a 2x speedup, at the expense of lot of asm code.

I had an idea now for using a Hensel norm Euclidean algorithm.  Finding
a^(-1) mod 2^b where b is the limb size in the inner loop would be
expensive, but how about letting b = 8 or thereabout?  Then a table
lookup would work for the inverses.  Does that make sense?


-- 
Torbjörn
Please encrypt, key id 0xC8601622


More information about the gmp-devel mailing list