What's a reasonable size ratio for toom32?

Niels Möller nisse at lysator.liu.se
Wed Sep 27 21:56:14 CEST 2023


Niels Möller <nisse at lysator.liu.se> writes:

> Niels Möller <nisse at lysator.liu.se> writes:
>
>> And now I found that speed's handling of toom22, toom32 and friends
>> always use fixed ratios, so I have to extend that to be able to get any
>> interesting benchmarks when varying the ratio.
>
> Below patch seems to work. Most subtle part was to specify reasonable
> ratio limits for the various toom functions. Tested by running

Pushed now. I've done some benchmarks on shell, on tip-of-tree GMP (no
local changes). See numbers at the end of this message, comparing
mpn_mul_basecase, mpn_toom22_mul and mpn_toom32_mul. 

To recall, the *_mul.-N notation to speed means that product size is N
limbs. The speed s parameter is the size of the larger factor, so it
must be >= N/2, and as s increases, there's less work as the multiply
becomes more and mroe unbalanced.

Some observations:

1. On this machine, MUL_TOOM22_THRESHOLD is 20. Toom-32 could use its
   own, higher, threshold. E.g, it doesn't beat basecase for product
   size 50, e.g., 28x22. And it's barely able to beat basecase for a few
   sizes at product size 70.

2. The current cutoff ratio of 0.8 between Toom-22 and Toom-32 looks
   pretty good.

3. For larger sizes, Toom-32 beats basecase also for ratios well below
   0.5. Perhaps not so surprising, since the main work will be recursive
   balanced calls to Toom-22 (we're still below MUL_TOOM33_THRESHOLD,
   which is 65 on this machine), so perhaps top unbalanced product
   doesn't matter much. So beating basecase is perhaps a too low bar;
   for ratios around 0.5, Toom-32 should compete against either repeated
   Toom-22 or Toom-42, not included in these benchmarks.

So next, I'd like to repeat benchmarks including my Toom-32, changes to
see if I get more interesting results after fixing speed to do what I
need it to do.

I've also been thinking a bit more about the graph of which Toom
functions should call each other.

I currently believe that Toom-33 should recurse to Toom-32 for some size
ratios (it would also need to recurse to itself, Toom-22, and we need to
have either Toom-42 or Toom-43 in there as well). However, I wonder if
we have any machines with thresholds such that Toom-33 can recurse to
Toom-32 and with the recursive calls from Toom-32 still being in Toom-33
range? I would hope that it's fine to hardcode Toom-32 to recurse only
to itself, Toom-22 and basecase, even for this case.

Regards,
/Niels

$ ./speed -c -s 25-37 mpn_mul_basecase.-50 mpn_toom22_mul.-50 mpn_toom32_mul.-50
overhead 7.17 cycles, precision 10000 units of 2.86e-10 secs, CPU freq 3500.09 MHz
        mpn_mul_basecase.-50 mpn_toom22_mul.-50 mpn_toom32_mul.-50
25            2018.83      #1884.67           n/a
26            2080.80      #1894.83       2543.40
27            2124.00      #1970.67       2226.60
28           #2036.40           n/a       2143.80 28x22 0.786
29           #2034.60           n/a       2137.80
30           #2002.60           n/a       2352.83
31           #1905.50           n/a       2161.20
32           #1851.17           n/a       2168.80
33           #1831.67           n/a       2164.80
34           #1774.33           n/a       2652.40
35           #1733.50           n/a       2641.40
36           #1605.86           n/a       2587.00
37           #1559.14           n/a           n/a

$ ./speed -c -s 35-52 mpn_mul_basecase.-70 mpn_toom22_mul.-70  mpn_toom32_mul.-70
overhead 7.15 cycles, precision 10000 units of 2.86e-10 secs, CPU freq 3500.09 MHz
        mpn_mul_basecase.-70 mpn_toom22_mul.-70 mpn_toom32_mul.-70
35            3960.00      #3499.00           n/a
36            3903.33      #3512.67       3928.67
37            3937.33      #3623.33       3947.33
38            3923.00      #3611.67       4232.00 38x32 0.842
39            3907.33           n/a      #3713.67 39x31 0.818
40           #3824.67           n/a       6855.50
41           #3854.67           n/a       4274.00
42           #3802.33           n/a       4258.33
43            3768.33           n/a      #3744.00 43x27 0.628
44           #3686.67           n/a       3755.67
45           #3648.00           n/a       3767.33
46           #3599.00           n/a       4350.67
47           #3548.67           n/a       4509.33
48           #3450.33           n/a       4277.00
49           #3415.33           n/a       4655.00
50           #3149.25           n/a       4011.33
51           #3064.00           n/a       3943.33
52           #2974.25           n/a           n/a

$ ./speed -c -s 50-74 mpn_mul_basecase.-100 mpn_toom22_mul.-100  mpn_toom32_mul.-100
overhead 7.17 cycles, precision 10000 units of 2.86e-10 secs, CPU freq 3500.09 MHz
        mpn_mul_basecase.-100 mpn_toom22_mul.-100 mpn_toom32_mul.-100
50            7625.50      #6052.00           n/a
51            7630.00      #6103.00       7214.50
52            7567.00      #6174.50       6376.00
53            7582.00      #6556.50       6640.00
54            7572.00      #6403.50       6584.50
55            7576.00      #6424.00       7046.50 55x45 0.818
56            7475.50           n/a      #6357.00 56x44 0.786
57            7446.00           n/a      #6465.00
58            7420.00           n/a      #6408.00
59            7379.50           n/a      #6438.50
60            7281.50           n/a      #6260.50
61            7262.50           n/a      #6790.00
62            7213.00           n/a      #6715.50
63            7144.50           n/a      #6399.50
64            7036.50           n/a      #6478.00
65            7007.50           n/a      #6440.00
66            6893.50           n/a      #6489.50
67            6828.00           n/a      #6696.50 67x33 0.493
68           #6685.00           n/a       7061.00
69           #6622.00           n/a       6755.00
70            6529.00           n/a      #6514.50 70x30 0.429
71           #6419.50           n/a       6772.50
72           #6298.50           n/a       6399.00
73           #6179.00           n/a       6944.50
74           #6080.00           n/a           n/a

$ ./speed -c -s 75-112 mpn_mul_basecase.-150 mpn_toom22_mul.-150  mpn_toom32_mul.-150
overhead 7.17 cycles, precision 10000 units of 2.86e-10 secs, CPU freq 3500.09 MHz
        mpn_mul_basecase.-150 mpn_toom22_mul.-150 mpn_toom32_mul.-150
75           16739.00     #12183.00           n/a
76           16683.00     #12148.00      15689.00
77           16669.00     #12241.00      15704.00
78           16689.00     #12195.00      12930.00
79           16669.00     #13586.00      14219.00
80           16616.00     #12641.00      13504.00
81           16628.00     #13429.00      13586.00
82           16611.00      13262.00     #12425.00 82x68 0.829
83           16561.00      13464.00     #13250.00
84           16785.00           n/a     #13192.00
85           16403.00           n/a     #13790.00
86           16421.00           n/a     #12685.00
87           16319.00           n/a     #14747.00
88           16219.00           n/a     #12119.00
89           16132.00           n/a     #14117.00
90           16118.00           n/a     #12393.00
91           16039.00           n/a     #14166.00
92           15919.00           n/a     #14102.00
93           15788.00           n/a     #12594.00
94           15770.00           n/a     #12148.00
95           15645.00           n/a     #13309.00
96           15476.00           n/a     #12448.00
97           15377.00           n/a     #13387.00
98           15330.00           n/a     #13842.00
99           15173.00           n/a     #13312.00
100          15018.00           n/a     #13898.00
101          14875.00           n/a     #13338.00
102          14767.00           n/a     #13318.00
103          14601.00           n/a     #14577.00
104         #14467.00           n/a      15572.00
105          14297.00           n/a     #13664.00
106          14201.00           n/a     #13154.00
107          13965.00           n/a     #13766.00
108          13810.00           n/a     #13020.00 108x42 0.389
109         #13627.00           n/a      15919.00
110         #13475.00           n/a      13983.00
111         #13259.00           n/a      14079.00
112         #13046.00           n/a           n/a

-- 
Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677.
Internet email is subject to wholesale government surveillance.


More information about the gmp-devel mailing list