Small operands gcd improvements
Niels Möller
nisse at lysator.liu.se
Thu Aug 8 05:11:19 UTC 2019
tg at gmplib.org (Torbjörn Granlund) writes:
> I'm having problems with timing of the gcd_11 code. Unfortunately, the
> nested macros of speed.h make things hard to read. Could yo
> double-check that operands to gcd_11 are odd and full limbs?
I'm fairly sure they are odd; I've ran speed on a --disable-assembly
--enable-assert build. But they're full limbs only for -s 64. This is
what I see, after your asm changes:
$ ./tune/speed -p 100000 -c -s 1-64 mpn_gcd_1 mpn_gcd_11
overhead 5.01 cycles, precision 100000 units of 1.14e-09 secs, CPU freq
876.11 MHz
mpn_gcd_1 mpn_gcd_11
1 12.05 #8.05
2 13.56 #8.69
3 16.23 #8.88
4 16.67 #9.37
5 20.04 #12.55
6 22.88 #16.06
7 28.21 #21.21
8 32.47 #27.34
9 34.53 #32.17
10 41.24 #38.47
11 43.65 #42.90
12 #49.50 50.26
13 #55.92 58.40
14 #57.88 61.12
15 #66.06 66.68
16 #68.26 71.43
17 #71.36 75.56
18 #77.95 81.72
19 #83.09 86.64
20 #84.53 89.62
21 #86.88 93.23
22 #97.32 98.35
23 101.75 #101.46
24 #107.63 108.32
25 #110.05 110.30
26 118.25 #114.15
27 #115.00 119.02
28 #118.93 123.26
29 #123.22 124.74
30 #127.43 131.10
31 #134.56 139.14
32 139.34 #139.21
33 142.63 #141.76
34 #145.42 147.85
35 #147.92 150.54
36 #151.31 154.07
37 #152.37 159.75
38 #155.48 162.92
39 168.28 #164.72
40 #163.92 169.57
41 177.38 #171.68
42 #183.22 185.21
43 #188.94 189.92
44 #195.21 196.79
45 198.06 #197.09
46 #200.67 201.34
47 #203.37 205.09
48 208.80 #208.16
49 #212.83 216.31
50 #215.69 217.80
51 #219.62 228.48
52 237.99 #228.64
53 222.52 #219.11
54 #219.05 222.15
55 #221.39 227.44
56 #225.41 230.09
57 #226.36 235.82
58 #231.66 242.04
59 #239.70 246.53
60 #239.07 246.82
61 #242.48 249.31
62 #244.55 254.03
63 #249.94 257.11
64 #254.20 262.47
I.e., differences of just a few cycles, but disappointingly not always
in favor of the new gcd_11. If we fix the size of one of the operands,
the difference due to initial reduction in gcd_1 is very visible:
$ ./tune/speed -p 100000 -c -s 1-64 mpn_gcd_1.10 mpn_gcd_11.10
overhead 5.01 cycles, precision 100000 units of 9.00e-10 secs, CPU freq
1111.62 MHz
mpn_gcd_1.10 mpn_gcd_11.10
1 48.56 #31.68
2 47.65 #30.24
3 48.78 #33.06
4 49.02 #33.15
5 50.00 #31.07
6 48.74 #34.44
7 52.14 #35.24
8 52.51 #37.61
9 46.81 #39.10
10 56.40 #41.97
11 59.66 #42.12
12 62.68 #47.73
13 71.72 #49.63
14 84.10 #54.56
15 102.05 #57.25
16 117.86 #63.12
17 125.77 #58.76
18 120.16 #63.45
19 123.74 #66.41
20 145.42 #82.43
21 143.24 #83.48
22 147.30 #86.79
23 139.39 #81.41
24 126.56 #84.67
25 124.91 #86.97
26 126.57 #88.07
27 149.59 #102.50
28 146.80 #109.59
29 146.96 #103.42
30 140.75 #101.46
31 127.92 #103.95
32 126.41 #108.93
33 126.68 #112.32
34 142.09 #126.03
35 143.65 #130.04
36 142.99 #130.96
37 143.80 #131.02
38 127.91 #126.18
39 #127.76 129.48
40 #128.44 130.90
41 151.79 #142.34
42 #142.43 145.96
43 #142.62 146.64
44 #128.56 145.52
45 #129.30 150.46
46 #129.68 155.30
47 #129.07 159.03
48 #132.37 158.59
49 #126.59 158.04
50 #128.14 161.57
51 #129.74 164.20
52 #127.63 166.14
53 #130.12 170.52
54 #130.41 175.19
55 #133.91 173.54
56 #129.01 176.41
57 #127.91 178.71
58 #126.03 181.28
59 #129.15 185.43
60 #130.57 188.86
61 #130.14 190.29
62 #127.46 194.79
63 #129.22 196.25
64 #127.74 202.10
> Speaking of gcd_22. We need to determine this function's interface.
I've been considering
void
mpn_gcd_22 (mp_ptr rp,
mp_limb_t u1, mp_limb_t u0,
mp_limb_t v1, mp_limb_t v0);
with the requirement that u0 and v0 are odd. Do you prefer something
different? I'm thinking that it should not be a public function.
> The last loop will be 11. We can simply inline a copy here as it is
> tiny. (A tail call won't work as the functions will have different
> return types.)
Except if we have lots of tuned variants of gcd_11 and fewer gcd_22;
then gcd_22 ought to call the separately selected gcd_11.
Regards,
/Niels
--
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel
mailing list