64-bit vs 32-bit GMP code
Torbjörn Granlund
tg at gmplib.org
Fri Feb 16 16:48:38 UTC 2018
GMP's 64-bit inner loops or 64-bit x86 processors are well-tuned in most
cases, but what's the state of the code for the 32-bit ABI? Let's check
the key function mpn_mul_basecase.
The tables below are for some smallish sizes (1-24) with 32-bit numbers
to the left and 64-bit numbers to the right. It should be reasonably
straightforward to make the 32-bit numbers <= the 64-bit numbers. (Note
that the 64-bit code for size k works with twice as many bits; for
performance parity the 32-bit numbers should be 1/4 of the 64-bit
numbers.)
The lack of registers might make it hard in some cases when writing
32-bit code, but then we also have 32x32->64 SIMD instructions which
could allow 32-bit code to perform quite well.
sky
overhead 3.83 cycles, precision 100 overhead 3.87 cycles, precision 100
mpn_mul_basecase mpn_mul_basecase
1 17.73 1 5.76
2 21.24 2 8.64
3 30.03 3 31.77
4 52.08 4 42.13
5 71.18 5 58.29
6 91.29 6 74.64
7 118.18 7 99.14
8 149.42 8 122.40
9 185.41 9 154.45
10 228.36 10 193.09
11 269.21 11 228.16
12 317.49 12 270.28
13 372.15 13 310.56
14 430.09 14 350.19
15 486.97 15 401.69
16 552.88 16 443.84
17 623.82 17 497.54
18 692.53 18 551.11
19 774.26 19 650.64
20 871.09 20 705.41
21 944.87 21 768.52
22 1048.24 22 858.98
23 1140.10 23 944.86
24 1245.76 24 996.47
bwl
overhead 3.61 cycles, precision 100 overhead 4.50 cycles, precision 100
mpn_mul_basecase mpn_mul_basecase
1 23.04 1 5.41
2 25.45 2 8.14
3 37.11 3 32.26
4 59.40 4 42.35
5 76.94 5 58.28
6 98.92 6 75.38
7 124.27 7 101.34
8 156.98 8 128.51
9 193.90 9 156.83
10 231.27 10 186.24
11 273.86 11 226.21
12 321.92 12 263.49
13 373.56 13 304.67
14 428.50 14 345.56
15 485.08 15 391.01
16 563.34 16 458.92
17 617.93 17 506.08
18 683.42 18 550.50
19 759.70 19 645.58
20 852.90 20 704.35
21 949.14 21 774.62
22 1017.56 22 843.40
23 1102.98 23 911.01
24 1179.95 24 1005.32
hwl
overhead 3.64 cycles, precision 100 overhead 4.79 cycles, precision 100
mpn_mul_basecase mpn_mul_basecase
1 24.07 1 9.08
2 26.46 2 15.61
3 38.34 3 38.99
4 59.90 4 49.90
5 78.41 5 68.69
6 100.12 6 89.16
7 125.18 7 117.71
8 158.85 8 145.89
9 194.14 9 186.32
10 230.63 10 224.55
11 273.01 11 265.96
12 323.90 12 310.32
13 374.37 13 370.06
14 426.48 14 420.78
15 483.05 15 478.04
16 547.90 16 538.98
17 614.29 17 615.87
18 677.35 18 680.80
19 752.81 19 752.47
20 843.95 20 828.32
21 932.43 21 947.38
22 1014.50 22 1004.41
23 1094.76 23 1091.09
24 1178.47 24 1201.34
sbr
overhead 5.46 cycles, precision 100 overhead 5.45 cycles, precision 100
mpn_mul_basecase mpn_mul_basecase
1 22.83 1 10.05
2 28.41 2 18.29
3 41.36 3 43.78
4 66.51 4 58.41
5 86.38 5 83.44
6 112.78 6 113.80
7 143.03 7 154.55
8 177.99 8 190.80
9 217.68 9 249.61
10 256.29 10 298.84
11 306.04 11 356.44
12 357.01 12 411.06
13 408.58 13 500.64
14 469.01 14 576.88
15 531.21 15 649.64
16 596.16 16 718.06
17 674.41 17 832.25
18 749.42 18 953.98
19 827.11 19 1025.89
20 925.26 20 1114.01
21 1014.86 21 1256.97
22 1102.14 22 1376.01
23 1204.53 23 1488.91
24 1299.31 24 1592.39
nhm
overhead 5.53 cycles, precision 100 overhead 5.53 cycles, precision 100
mpn_mul_basecase mpn_mul_basecase
1 19.35 1 15.39
2 26.92 2 24.97
3 41.33 3 49.90
4 74.74 4 72.99
5 107.22 5 111.46
6 141.13 6 152.27
7 178.93 7 203.33
8 249.93 8 260.52
9 291.99 9 325.47
10 347.75 10 399.40
11 407.40 11 486.28
12 471.57 12 568.98
13 590.96 13 673.04
14 617.92 14 770.43
15 745.61 15 883.54
16 813.53 16 1010.44
17 867.56 17 1132.38
18 974.72 18 1268.15
19 1179.54 19 1407.04
20 1311.75 20 1559.93
21 1427.09 21 1714.40
22 1508.94 22 1887.66
23 1529.68 23 2045.38
24 1803.39 24 2237.08
pnr
overhead 6.06 cycles, precision 100 overhead 6.06 cycles, precision 100
mpn_mul_basecase mpn_mul_basecase
1 21.23 1 14.16
2 36.16 2 25.29
3 53.57 3 53.60
4 92.92 4 83.72
5 133.46 5 125.40
6 162.15 6 172.76
7 223.63 7 234.54
8 266.18 8 295.01
9 351.44 9 374.13
10 397.70 10 450.58
11 480.81 11 544.37
12 573.12 12 639.29
13 657.97 13 752.05
14 762.29 14 869.50
15 861.01 15 992.66
16 968.78 16 1124.44
17 1095.23 17 1269.71
18 1221.56 18 1415.05
19 1351.80 19 1581.17
20 1527.94 20 1716.28
21 1647.99 21 1921.19
22 1808.22 22 2095.04
23 2002.19 23 2298.09
24 2144.02 24 2484.15
cnr
overhead 6.09 cycles, precision 100 overhead 6.09 cycles, precision 100
mpn_mul_basecase mpn_mul_basecase
1 21.34 1 14.22
2 36.25 2 25.63
3 53.88 3 53.58
4 88.84 4 84.40
5 131.04 5 127.57
6 164.41 6 172.38
7 211.69 7 233.73
8 272.97 8 296.48
9 352.80 9 374.42
10 402.66 10 457.72
11 483.17 11 548.49
12 585.57 12 643.79
13 661.58 13 760.53
14 775.12 14 861.97
15 876.44 15 997.16
16 985.18 16 1132.71
17 1108.62 17 1275.92
18 1236.09 18 1419.61
19 1387.57 19 1591.41
20 1547.17 20 1732.12
21 1656.25 21 1930.03
22 1825.50 22 2107.50
23 2013.27 23 2310.75
24 2162.06 24 2494.36
bay
overhead 3.03 cycles, precision 100 overhead 3.03 cycles, precision 100
mpn_mul_basecase mpn_mul_basecase
1 40.30 1 29.28
2 72.72 2 54.53
3 119.20 3 103.99
4 169.68 4 155.51
5 248.45 5 277.48
6 315.08 6 373.64
7 464.57 7 488.52
8 564.60 8 605.87
9 678.73 9 756.33
10 810.46 10 899.51
11 969.54 11 1085.65
12 1097.85 12 1254.57
13 1254.41 13 1488.69
14 1446.08 14 1684.82
15 1642.15 15 1950.22
16 1823.07 16 2177.77
17 2023.97 17 2484.44
18 2267.26 18 2737.17
19 2510.57 19 3072.79
20 2734.67 20 3359.35
21 2979.46 21 3738.52
22 3274.07 22 4048.76
23 3564.82 23 4454.06
24 3832.79 24 4798.70
zen
overhead 4.67 cycles, precision 100 overhead 4.65 cycles, precision 100
mpn_mul_basecase mpn_mul_basecase
1 10.65 1 4.65
2 20.28 2 12.17
3 48.77 3 29.74
4 68.65 4 40.62
5 95.49 5 57.68
6 123.14 6 77.52
7 161.53 7 99.88
8 204.41 8 126.19
9 246.08 9 165.96
10 293.37 10 200.30
11 421.92 11 239.01
12 469.59 12 311.81
13 539.07 13 339.11
14 598.63 14 390.01
15 593.66 15 464.13
16 665.57 16 552.27
17 752.32 17 584.10
18 838.21 18 645.89
19 929.49 19 686.08
20 1039.34 20 760.87
21 1115.45 21 852.36
22 1224.76 22 921.91
23 1353.20 23 1014.00
24 1468.26 24 1085.29
exca
overhead 5.87 cycles, precision 100 overhead 4.01 cycles, precision 100
mpn_mul_basecase mpn_mul_basecase
1 13.19 1 12.01
2 23.01 2 30.82
3 56.16 3 71.70
4 82.38 4 91.54
5 136.22 5 138.29
6 186.74 6 193.43
7 239.87 7 256.03
8 302.59 8 337.80
9 357.96 9 402.94
10 408.52 10 493.09
11 496.63 11 604.43
12 574.63 12 677.93
13 662.49 13 778.87
14 754.49 14 891.70
15 868.56 15 1013.03
16 904.22 16 1231.66
17 1020.40 17 1397.35
18 1179.57 18 1482.60
19 1341.73 19 1669.62
20 1347.79 20 1833.88
21 1501.84 21 2155.88
22 1613.86 22 2361.74
23 1770.83 23 2420.84
24 1873.31 24 2800.91
pile
overhead 5.79 cycles, precision 100 overhead 5.79 cycles, precision 100
mpn_mul_basecase mpn_mul_basecase
1 14.03 1 13.12
2 29.62 2 27.99
3 69.50 3 63.21
4 100.87 4 97.62
5 149.58 5 136.08
6 195.80 6 185.62
7 248.57 7 247.54
8 306.08 8 307.29
9 370.54 9 401.54
10 446.01 10 485.52
11 514.41 11 564.89
12 600.85 12 662.96
13 690.32 13 797.47
14 799.69 14 908.52
15 909.04 15 1034.96
16 1043.78 16 1174.45
17 1144.01 17 1343.91
18 1275.35 18 1512.24
19 1423.85 19 1656.83
20 1536.26 20 1819.50
21 1691.20 21 2078.82
22 1860.72 22 2201.40
23 1993.33 23 2415.29
24 2163.74 24 2566.52
tutu
overhead 5.79 cycles, precision 100 overhead 5.81 cycles, precision 100
mpn_mul_basecase mpn_mul_basecase
1 16.48 1 13.61
2 33.34 2 31.89
3 77.74 3 73.65
4 119.47 4 111.27
5 166.26 5 152.44
6 214.44 6 191.01
7 262.03 7 261.30
8 312.70 8 311.57
9 414.53 9 413.66
10 465.66 10 492.64
11 547.32 11 585.00
12 632.07 12 699.12
13 784.90 13 834.63
14 850.57 14 960.98
15 976.03 15 1062.95
16 1074.15 16 1252.71
17 1241.83 17 1398.59
18 1304.93 18 1522.47
19 1465.31 19 1677.83
20 1581.26 20 1878.54
21 1776.59 21 2100.60
22 1893.79 22 2268.86
23 2079.75 23 2416.21
24 2281.34 24 2665.11
king
overhead 5.45 cycles, precision 100 overhead 5.44 cycles, precision 100
mpn_mul_basecase mpn_mul_basecase
1 13.61 1 15.38
2 26.33 2 23.50
3 60.92 3 41.71
4 91.85 4 56.04
5 124.10 5 80.80
6 171.83 6 108.21
7 223.15 7 140.16
8 277.71 8 194.98
9 332.43 9 240.07
10 403.34 10 274.03
11 480.73 11 323.44
12 549.97 12 368.85
13 657.86 13 446.64
14 752.77 14 493.81
15 845.39 15 560.80
16 955.62 16 623.37
17 1075.40 17 727.88
18 1185.40 18 779.03
19 1313.77 19 859.10
20 1451.28 20 941.07
21 1576.94 21 1077.84
22 1727.44 22 1134.27
23 1889.27 23 1232.65
24 2036.82 24 1332.18
--
Torbjörn
Please encrypt, key id 0xC8601622
More information about the gmp-devel
mailing list