What's a reasonable size ratio for toom32?
Niels Möller
nisse at lysator.liu.se
Wed Sep 27 21:56:14 CEST 2023
Niels Möller <nisse at lysator.liu.se> writes:
> Niels Möller <nisse at lysator.liu.se> writes:
>
>> And now I found that speed's handling of toom22, toom32 and friends
>> always use fixed ratios, so I have to extend that to be able to get any
>> interesting benchmarks when varying the ratio.
>
> Below patch seems to work. Most subtle part was to specify reasonable
> ratio limits for the various toom functions. Tested by running
Pushed now. I've done some benchmarks on shell, on tip-of-tree GMP (no
local changes). See numbers at the end of this message, comparing
mpn_mul_basecase, mpn_toom22_mul and mpn_toom32_mul.
To recall, the *_mul.-N notation to speed means that product size is N
limbs. The speed s parameter is the size of the larger factor, so it
must be >= N/2, and as s increases, there's less work as the multiply
becomes more and mroe unbalanced.
Some observations:
1. On this machine, MUL_TOOM22_THRESHOLD is 20. Toom-32 could use its
own, higher, threshold. E.g, it doesn't beat basecase for product
size 50, e.g., 28x22. And it's barely able to beat basecase for a few
sizes at product size 70.
2. The current cutoff ratio of 0.8 between Toom-22 and Toom-32 looks
pretty good.
3. For larger sizes, Toom-32 beats basecase also for ratios well below
0.5. Perhaps not so surprising, since the main work will be recursive
balanced calls to Toom-22 (we're still below MUL_TOOM33_THRESHOLD,
which is 65 on this machine), so perhaps top unbalanced product
doesn't matter much. So beating basecase is perhaps a too low bar;
for ratios around 0.5, Toom-32 should compete against either repeated
Toom-22 or Toom-42, not included in these benchmarks.
So next, I'd like to repeat benchmarks including my Toom-32, changes to
see if I get more interesting results after fixing speed to do what I
need it to do.
I've also been thinking a bit more about the graph of which Toom
functions should call each other.
I currently believe that Toom-33 should recurse to Toom-32 for some size
ratios (it would also need to recurse to itself, Toom-22, and we need to
have either Toom-42 or Toom-43 in there as well). However, I wonder if
we have any machines with thresholds such that Toom-33 can recurse to
Toom-32 and with the recursive calls from Toom-32 still being in Toom-33
range? I would hope that it's fine to hardcode Toom-32 to recurse only
to itself, Toom-22 and basecase, even for this case.
Regards,
/Niels
$ ./speed -c -s 25-37 mpn_mul_basecase.-50 mpn_toom22_mul.-50 mpn_toom32_mul.-50
overhead 7.17 cycles, precision 10000 units of 2.86e-10 secs, CPU freq 3500.09 MHz
mpn_mul_basecase.-50 mpn_toom22_mul.-50 mpn_toom32_mul.-50
25 2018.83 #1884.67 n/a
26 2080.80 #1894.83 2543.40
27 2124.00 #1970.67 2226.60
28 #2036.40 n/a 2143.80 28x22 0.786
29 #2034.60 n/a 2137.80
30 #2002.60 n/a 2352.83
31 #1905.50 n/a 2161.20
32 #1851.17 n/a 2168.80
33 #1831.67 n/a 2164.80
34 #1774.33 n/a 2652.40
35 #1733.50 n/a 2641.40
36 #1605.86 n/a 2587.00
37 #1559.14 n/a n/a
$ ./speed -c -s 35-52 mpn_mul_basecase.-70 mpn_toom22_mul.-70 mpn_toom32_mul.-70
overhead 7.15 cycles, precision 10000 units of 2.86e-10 secs, CPU freq 3500.09 MHz
mpn_mul_basecase.-70 mpn_toom22_mul.-70 mpn_toom32_mul.-70
35 3960.00 #3499.00 n/a
36 3903.33 #3512.67 3928.67
37 3937.33 #3623.33 3947.33
38 3923.00 #3611.67 4232.00 38x32 0.842
39 3907.33 n/a #3713.67 39x31 0.818
40 #3824.67 n/a 6855.50
41 #3854.67 n/a 4274.00
42 #3802.33 n/a 4258.33
43 3768.33 n/a #3744.00 43x27 0.628
44 #3686.67 n/a 3755.67
45 #3648.00 n/a 3767.33
46 #3599.00 n/a 4350.67
47 #3548.67 n/a 4509.33
48 #3450.33 n/a 4277.00
49 #3415.33 n/a 4655.00
50 #3149.25 n/a 4011.33
51 #3064.00 n/a 3943.33
52 #2974.25 n/a n/a
$ ./speed -c -s 50-74 mpn_mul_basecase.-100 mpn_toom22_mul.-100 mpn_toom32_mul.-100
overhead 7.17 cycles, precision 10000 units of 2.86e-10 secs, CPU freq 3500.09 MHz
mpn_mul_basecase.-100 mpn_toom22_mul.-100 mpn_toom32_mul.-100
50 7625.50 #6052.00 n/a
51 7630.00 #6103.00 7214.50
52 7567.00 #6174.50 6376.00
53 7582.00 #6556.50 6640.00
54 7572.00 #6403.50 6584.50
55 7576.00 #6424.00 7046.50 55x45 0.818
56 7475.50 n/a #6357.00 56x44 0.786
57 7446.00 n/a #6465.00
58 7420.00 n/a #6408.00
59 7379.50 n/a #6438.50
60 7281.50 n/a #6260.50
61 7262.50 n/a #6790.00
62 7213.00 n/a #6715.50
63 7144.50 n/a #6399.50
64 7036.50 n/a #6478.00
65 7007.50 n/a #6440.00
66 6893.50 n/a #6489.50
67 6828.00 n/a #6696.50 67x33 0.493
68 #6685.00 n/a 7061.00
69 #6622.00 n/a 6755.00
70 6529.00 n/a #6514.50 70x30 0.429
71 #6419.50 n/a 6772.50
72 #6298.50 n/a 6399.00
73 #6179.00 n/a 6944.50
74 #6080.00 n/a n/a
$ ./speed -c -s 75-112 mpn_mul_basecase.-150 mpn_toom22_mul.-150 mpn_toom32_mul.-150
overhead 7.17 cycles, precision 10000 units of 2.86e-10 secs, CPU freq 3500.09 MHz
mpn_mul_basecase.-150 mpn_toom22_mul.-150 mpn_toom32_mul.-150
75 16739.00 #12183.00 n/a
76 16683.00 #12148.00 15689.00
77 16669.00 #12241.00 15704.00
78 16689.00 #12195.00 12930.00
79 16669.00 #13586.00 14219.00
80 16616.00 #12641.00 13504.00
81 16628.00 #13429.00 13586.00
82 16611.00 13262.00 #12425.00 82x68 0.829
83 16561.00 13464.00 #13250.00
84 16785.00 n/a #13192.00
85 16403.00 n/a #13790.00
86 16421.00 n/a #12685.00
87 16319.00 n/a #14747.00
88 16219.00 n/a #12119.00
89 16132.00 n/a #14117.00
90 16118.00 n/a #12393.00
91 16039.00 n/a #14166.00
92 15919.00 n/a #14102.00
93 15788.00 n/a #12594.00
94 15770.00 n/a #12148.00
95 15645.00 n/a #13309.00
96 15476.00 n/a #12448.00
97 15377.00 n/a #13387.00
98 15330.00 n/a #13842.00
99 15173.00 n/a #13312.00
100 15018.00 n/a #13898.00
101 14875.00 n/a #13338.00
102 14767.00 n/a #13318.00
103 14601.00 n/a #14577.00
104 #14467.00 n/a 15572.00
105 14297.00 n/a #13664.00
106 14201.00 n/a #13154.00
107 13965.00 n/a #13766.00
108 13810.00 n/a #13020.00 108x42 0.389
109 #13627.00 n/a 15919.00
110 #13475.00 n/a 13983.00
111 #13259.00 n/a 14079.00
112 #13046.00 n/a n/a
--
Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel
mailing list