Improvements for power64/mode64
Mark Rodenkirch
mgrogue at wi.rr.com
Sun Mar 26 18:54:32 CEST 2006
On Mar 26, 2006, at 9:19 AM, Torbjorn Granlund wrote:
> Mark Rodenkirch <mgrogue at wi.rr.com> writes:
>
> Assuming I understand how to use speed correctly I am getting about
> between 8.6 and 8.7 cycles per limb for both addmul and submul.
> sqr_diagonal is between 7.8 and 7.9 cycles per limb. If you have a
> single use of speed that lets both of us know that I am comparing
> apples to apples, that would be great.
>
> A typical use would be
>
> speed -C -s1-1000 -f1.1 mpn_addmul_1.1 mpn_submul_1.1
Great.
Here is the before:
/Distributed/gmp-4.1.99/tune > ./speed -C -s1-1000 -f1.1
mpn_addmul_1.1 mpn_submul_1.1 mpn_sqr_diagonal
overhead 6.63 cycles, precision 10000 units of 3.00e-08 secs, CPU
freq 2500.00 MHz
mpn_addmul_1.1 mpn_submul_1.1 mpn_sqr_diagonal
1 15.4082 17.2361 #10.2528
2 18.0057 16.7547 #10.3513
3 15.1702 15.4417 #9.7935
4 12.7514 14.2524 #9.3448
5 12.8009 14.4684 #9.0765
6 12.3352 15.0858 #8.8974
7 12.4303 15.2321 #8.7687
8 11.9185 14.3786 #8.6726
9 11.5017 14.4105 #8.5988
10 11.2769 14.3019 #8.5396
11 11.0930 14.2829 #8.4897
12 11.0227 14.2237 #8.4489
13 10.8239 14.1139 #8.4153
14 10.8381 14.1186 #8.4207
15 10.7659 14.1063 #8.4279
16 10.7210 13.9623 #8.3689
17 10.8150 13.9222 #8.3472
18 10.6317 13.9684 #8.3282
19 10.6249 13.9406 #8.3109
20 10.5023 13.9311 #8.2951
22 10.5930 13.8897 #8.2687
24 10.5025 13.8082 #8.2455
26 10.4509 13.8353 #8.2278
28 10.4311 13.7324 #8.2113
30 10.3696 13.7873 #8.1974
33 10.3360 13.7463 #8.1793
36 10.3543 13.6697 #8.1649
39 10.2853 13.6714 #8.1522
42 10.2617 13.6430 #8.1414
46 10.2200 13.6115 #8.1294
50 10.2158 13.5951 #8.1189
55 10.2321 13.5808 #8.1089
60 10.1987 13.5524 #8.0990
66 10.1755 13.5995 #8.0907
72 10.1504 13.5974 #8.0828
79 10.1378 13.5407 #8.0756
86 10.1369 13.5720 #8.0699
94 10.1213 13.5554 #8.0633
103 10.1119 13.5245 #8.0590
113 10.0991 13.5125 #8.0532
124 10.0956 13.5070 #8.0488
136 10.0817 13.5161 #8.0447
149 10.0700 13.5036 #8.0407
163 10.0669 13.5205 #8.0377
179 10.0652 13.5297 #8.0340
196 10.0615 13.4985 #8.0315
215 10.0534 13.5041 #8.0285
236 10.0482 13.4858 #8.0267
259 10.0467 13.5109 #8.0246
284 10.0463 13.4550 #8.0223
312 10.0362 13.4654 #8.0201
343 10.0330 13.4850 #8.0191
377 10.0379 13.4853 #8.0167
414 10.0267 13.4832 #8.0155
455 10.0264 13.4581 #8.0139
500 10.0258 13.4813 #8.0135
550 10.0238 13.4868 #8.0122
605 10.0207 13.4713 #8.0109
665 10.0186 13.4481 #8.0106
731 10.0176 13.4773 #8.0094
804 10.0168 13.4861 #8.0088
884 10.0379 13.4456 #8.0079
972 10.0347 13.4910 #8.0073
Here is the after:
/Distributed/gmp-4.1.99/tune > ./speed -C -s1-1000 -f1.1
mpn_addmul_1.1 mpn_submul_1.1 mpn_sqr_diagonal
overhead 6.75 cycles, precision 10000 units of 3.00e-08 secs, CPU
freq 2500.00 MHz
mpn_addmul_1.1 mpn_submul_1.1 mpn_sqr_diagonal
1 14.9862 16.2515 #12.8770
2 11.3779 12.3810 #10.8758
3 11.3483 12.0028 #9.8353
4 12.1296 11.4489 #9.1886
5 10.5530 12.4029 #8.8770
6 9.3084 11.5968 #8.6266
7 9.4014 11.6220 #8.4836
8 9.4974 12.1069 #8.3450
9 9.3424 11.5795 #8.2654
10 9.2108 9.5605 #8.0760
11 9.1978 9.7187 #8.2292
12 9.2183 9.8350 #8.0017
13 8.8908 9.1650 #8.1164
14 8.9274 9.2755 #7.8940
15 8.9943 9.4167 #8.0191
16 9.0167 9.5257 #7.9305
17 8.8835 9.0001 #7.9872
18 8.8792 9.1152 #7.8007
19 8.8339 9.2433 #7.9361
20 8.8704 9.3378 #7.8021
22 8.7390 9.0205 #7.7743
24 8.8047 9.2152 #7.7515
26 8.6991 8.9594 #7.7166
28 8.8300 9.1265 #7.7160
30 8.8336 9.1563 #7.7011
33 8.9409 9.1630 #7.7518
36 8.8542 9.1050 #7.6680
39 9.3730 8.9518 #7.7065
42 8.6479 8.8859 #7.6301
46 8.5845 8.8609 #7.6318
50 8.6029 8.8306 #7.6096
55 9.0716 8.8473 #7.6468
60 8.8317 8.8711 #7.6017
66 8.5455 8.7807 #7.5923
72 8.7378 8.8211 #7.5847
79 8.8690 8.7635 #7.6055
86 8.4915 8.7330 #7.5709
94 8.4812 8.7197 #7.5648
103 8.7636 8.7262 #7.5793
113 8.5314 8.7342 #7.5739
124 8.6080 8.7318 #7.5500
136 8.5803 8.7152 #7.5452
149 8.4786 8.7044 #7.5567
163 8.6187 8.6754 #7.5523
179 8.6071 8.6683 #7.5469
196 8.5216 8.6765 #7.5321
215 8.5672 8.6552 #7.5389
236 8.4982 8.6649 #7.5275
259 8.5322 8.6481 #7.5325
284 8.4849 8.6537 #7.5262
312 8.4730 8.6446 #7.5204
343 8.5008 8.6309 #7.5254
377 8.4207 8.6405 #7.5229
414 8.4099 8.6193 #7.5156
455 8.4738 8.6232 #7.5188
500 8.4438 8.6252 #7.5138
550 8.4039 8.6123 #7.5124
605 8.4109 8.6202 #7.5151
665 8.4078 8.6178 #7.5138
731 8.4447 8.6100 #7.5130
804 8.4217 8.6118 #7.5091
884 8.4240 8.6183 #7.5083
972 8.4175 8.6162 #7.5073
I'm still working on sqr_diagonal. I've made some changes compared
to the code I put on the list. It appears to work fine, but I would
like to test it more.
--Mark
More information about the gmp-devel
mailing list