What's a reasonable size ratio for toom32?
Niels Möller
nisse at lysator.liu.se
Wed Oct 11 14:48:47 CEST 2023
Niels Möller <nisse at lysator.liu.se> writes:
> See patch below. Makes both toom22 and toom32 recurse only to themselves
> (or to schoolbook), and use provided scratch space, without any
> additional allocations. Passes make check (with asserts enabled) on my
> machine, but I probably don't hit the case marked with a FIXME in
> mpn_mul.
Now I've run some new benchmarks, using the improved speed program.
"speed.old" is current gmp head, "speed" is with the toom32 changes.
When I look at these numbers, I see a modest speedup, about 1%. As I
understand it, the reason we have a speedup is that we (i) avoid
recursive calls to mpn_mul, and (ii) allocate all temporary storage up
front, and I think it's nice that that's at all measurable.
Regards,
/Niels
$ ./speed.old -p 10000000 -c -s 35-52 mpn_mul_basecase.-70 mpn_toom22_mul.-70 mpn_toom32_mul.-70
overhead 5.83 cycles, precision 10000000 units of 2.86e-10 secs, CPU freq 3500.09 MHz
mpn_mul_basecase.-70 mpn_toom22_mul.-70 mpn_toom32_mul.-70
35 3278.95 #2807.57 n/a
36 3232.10 #2809.00 3074.14
37 3253.73 #2914.72 3093.68
38 3243.21 2907.72 #2885.04
39 3231.25 n/a #2924.85
40 3163.23 n/a #2853.63
41 3155.46 n/a #2802.04
42 3123.59 n/a #2718.50
43 3106.55 n/a #2862.91
44 3007.33 n/a #2808.01
45 2975.00 n/a #2832.13
46 #2920.29 n/a 2942.15
47 #2872.80 n/a 2883.77
48 #2775.10 n/a 2868.21
49 #2726.65 n/a 3150.99
50 #2639.82 n/a 3118.55
51 #2551.59 n/a 3069.26
52 #2446.09 n/a n/a
$ ./speed -p 10000000 -c -s 35-52 mpn_mul_basecase.-70 mpn_toom22_mul.-70 mpn_toom32_mul.-70
overhead 5.83 cycles, precision 10000000 units of 2.86e-10 secs, CPU freq 3500.09 MHz
mpn_mul_basecase.-70 mpn_toom22_mul.-70 mpn_toom32_mul.-70
35 3278.95 #2807.24 n/a
36 3229.47 #2806.18 3049.43
37 3251.88 #2921.26 3076.06
38 3243.21 2908.94 #2844.26
39 3225.47 n/a #2889.04
40 3161.26 n/a #2832.61
41 3166.19 n/a #2795.00
42 3125.16 n/a #2690.38
43 3090.94 n/a #2828.44
44 3008.59 n/a #2789.46
45 2975.01 n/a #2804.18
46 2920.30 n/a #2909.20
47 2872.79 n/a #2851.71
48 #2775.11 n/a 2829.49
49 #2727.72 n/a 3092.84
50 #2641.13 n/a 3063.21
51 #2553.35 n/a 3023.80
52 #2454.06 n/a n/a
$ ./speed.old -p 10000000 -c -s 50-74 mpn_mul_basecase.-100 mpn_toom22_mul.-100 mpn_toom32_mul.-100
overhead 5.83 cycles, precision 10000000 units of 2.86e-10 secs, CPU freq 3500.09 MHz
mpn_mul_basecase.-100 mpn_toom22_mul.-100 mpn_toom32_mul.-100
50 6601.16 #4972.21 n/a
51 6596.33 #5035.81 5434.71
52 6539.30 #5097.57 5259.36
53 6539.55 5316.81 #5245.75
54 6547.79 5300.64 #5228.22
55 6533.80 #5251.83 5304.58
56 6463.24 n/a #5149.18
57 6447.48 n/a #5094.77
58 6406.94 n/a #5033.60
59 6388.55 n/a #5095.43
60 6262.57 n/a #4966.43
61 6222.77 n/a #5121.12
62 6173.48 n/a #5091.35
63 6137.99 n/a #5111.19
64 6010.73 n/a #5167.84
65 5968.52 n/a #5120.25
66 5891.06 n/a #5097.39
67 5822.02 n/a #5217.12
68 5677.20 n/a #5275.71
69 5606.83 n/a #5308.16
70 5513.04 n/a #5325.46
71 5406.09 n/a #5292.37
72 5238.58 n/a #5181.20
73 #5145.21 n/a 5479.67
74 #5045.51 n/a n/a
$ ./speed -p 10000000 -c -s 50-74 mpn_mul_basecase.-100 mpn_toom22_mul.-100 mpn_toom32_mul.-100
overhead 5.83 cycles, precision 10000000 units of 2.86e-10 secs, CPU freq 3500.09 MHz
mpn_mul_basecase.-100 mpn_toom22_mul.-100 mpn_toom32_mul.-100
50 6610.22 #4979.09 n/a
51 6600.22 #5028.20 5427.28
52 6531.08 #5081.27 5149.80
53 6540.07 5382.98 #5211.13
54 6540.78 5272.30 #5268.33
55 6526.27 #5259.12 5261.60
56 6443.64 n/a #5164.87
57 6448.62 n/a #5095.84
58 6402.06 n/a #5185.73
59 6385.01 n/a #5070.71
60 6269.03 n/a #4970.77
61 6226.69 n/a #5067.00
62 6177.27 n/a #5084.48
63 6135.12 n/a #5082.11
64 6011.92 n/a #5149.51
65 5965.79 n/a #5126.55
66 5872.83 n/a #5065.02
67 5822.76 n/a #5199.71
68 5679.58 n/a #5247.53
69 5605.92 n/a #5284.30
70 5502.04 n/a #5299.45
71 5407.54 n/a #5154.35
72 5234.21 n/a #5088.36
73 #5150.88 n/a 5423.00
74 #5074.65 n/a n/a
$ ./speed.old -p 10000000 -c -s 75-112 mpn_mul_basecase.-150 mpn_toom22_mul.-150 mpn_toom32_mul.-150
overhead 5.83 cycles, precision 10000000 units of 2.86e-10 secs, CPU freq 3500.09 MHz
mpn_mul_basecase.-150 mpn_toom22_mul.-150 mpn_toom32_mul.-150
75 14750.47 #10053.81 n/a
76 14641.86 #10059.59 10762.58
77 15032.23 #10094.08 10836.42
78 14649.34 #10139.68 10609.55
79 14706.27 #10228.62 10607.27
80 14558.66 #10087.28 10422.71
81 14899.93 #10425.90 10445.55
82 14561.86 10458.30 #10373.80
83 14571.35 10699.90 #10201.05
84 14432.08 n/a #10393.73
85 14396.36 n/a #10143.03
86 14331.65 n/a #9722.61
87 14307.18 n/a #9772.08
88 14185.09 n/a #9974.81
89 14144.53 n/a #9641.15
90 14090.81 n/a #9836.56
91 14048.55 n/a #9672.05
92 13864.52 n/a #9613.13
93 13796.34 n/a #9755.47
94 13741.17 n/a #9754.20
95 13649.14 n/a #9869.11
96 13455.40 n/a #9730.26
97 13377.38 n/a #10239.68
98 13291.98 n/a #10126.57
99 13172.15 n/a #10204.89
100 12980.91 n/a #10222.67
101 12879.49 n/a #10230.03
102 12771.23 n/a #10366.77
103 12632.98 n/a #10552.21
104 12413.33 n/a #10699.01
105 12282.08 n/a #10563.21
106 12132.77 n/a #10579.57
107 11968.49 n/a #10730.71
108 11760.53 n/a #10685.40
109 11626.50 n/a #10899.59
110 11426.35 n/a #10910.19
111 11315.81 n/a #10787.85
112 #11047.85 n/a n/a
$ ./speed -p 10000000 -c -s 75-112 mpn_mul_basecase.-150 mpn_toom22_mul.-150 mpn_toom32_mul.-150
overhead 5.83 cycles, precision 10000000 units of 2.86e-10 secs, CPU freq 3500.09 MHz
mpn_mul_basecase.-150 mpn_toom22_mul.-150 mpn_toom32_mul.-150
75 14722.42 #10045.48 n/a
76 14681.34 #10059.76 10623.85
77 14688.72 #10108.38 10724.26
78 14665.32 #10128.87 10420.29
79 14711.07 #10193.76 10505.84
80 14555.44 #10088.39 10311.07
81 14575.39 10426.33 #10356.28
82 14566.95 10398.25 #10047.75
83 14523.21 10600.77 #10152.63
84 14421.79 n/a #10126.45
85 14411.42 n/a #10102.81
86 14345.01 n/a #9629.07
87 14331.22 n/a #9589.82
88 14177.51 n/a #9695.10
89 14135.56 n/a #9580.52
90 14085.55 n/a #9403.96
91 13987.50 n/a #9585.55
92 13839.73 n/a #9534.73
93 13800.50 n/a #9606.04
94 13732.90 n/a #9741.92
95 13646.09 n/a #9729.62
96 13445.50 n/a #9624.99
97 13390.72 n/a #10149.86
98 13271.28 n/a #10037.47
99 13158.93 n/a #10170.95
100 12995.47 n/a #10140.38
101 12884.30 n/a #10169.73
102 12771.51 n/a #10100.37
103 12618.64 n/a #10445.72
104 12414.24 n/a #10428.90
105 12280.04 n/a #10340.37
106 12144.41 n/a #10619.04
107 11977.16 n/a #10446.06
108 11761.08 n/a #10389.61
109 11646.30 n/a #10809.78
110 11433.62 n/a #10787.13
111 11278.82 n/a #10589.46
112 #11028.89 n/a n/a
--
Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel
mailing list