Possible new T3-T5 mul_1

David Miller davem at davemloft.net
Tue Apr 2 17:37:00 CEST 2013


From: Torbjorn Granlund <tg at gmplib.org>
Date: Tue, 02 Apr 2013 09:38:42 +0200

> Torbjorn Granlund <tg at gmplib.org> writes:
> 
>   This version probably overschedules loads, I'll try another variant some
>   day which fixes that.
>   
> Two variants.  The 1st is just the previous 3 c/l one, with a bug fix,
> and renamed.  The 2nd is a version which I hope still runs at 3 c/l, but
> with a shallower sw pipeline.

See attached, looks like mul1b isn't able to reach 3 c/l like mul1a can.
-------------- next part --------------
overhead 6.00 cycles, precision 10000000 units of 3.51e-10 secs, CPU freq 2847.41 MHz
          mpn_mul_1.3
1             12.0002
2              9.0001
3              6.7778
4              5.2501
5              5.2001
6              5.3334
7              4.9286
8              4.3750
9              4.2223
10             4.4000
11             4.2728
12             3.9167
13             3.9231
14             4.0000
15             3.9778
16             3.7044
17             3.7166
18             3.8334
19             3.8421
20             3.5900
22             3.6819
24             3.4792
26             3.5846
28             3.4286
30             3.5000
33             3.3864
36             3.3334
39             3.4103
42             3.3572
46             3.3261
50             3.3000
55             3.2909
60             3.2001
66             3.2273
72             3.1598
79             3.2026
86             3.1744
94             3.1596
103            3.1554
113            3.1062
124            3.3468
136            3.3162
149            3.3020
163            3.2945
179            3.2682
196            3.2194
215            3.2233
236            3.1822
259            3.1854
284            3.1514
312            3.1378
343            3.1400
377            3.1194
414            3.1136
455            3.1055
500            3.0860
550            3.0855
605            3.0744
665            3.0677
731            3.0657
804            3.0535
884            3.0487
972            3.0443
-------------- next part --------------
overhead 6.00 cycles, precision 10000000 units of 3.51e-10 secs, CPU freq 2847.59 MHz
          mpn_mul_1.3
1             12.0002
2              8.5001
3              6.5001
4              5.5001
5              5.3601
6              5.0556
7              4.7143
8              4.5626
9              4.6667
10             4.3000
11             4.2728
12             4.0715
13             4.2308
14             4.1429
15             4.0667
16             3.9063
17             4.0000
18             3.9445
19             3.8948
20             3.7750
22             3.8637
24             3.6667
26             3.7693
28             3.6429
30             3.7000
33             3.6667
36             3.5556
39             3.6411
42             3.5953
46             3.5653
50             3.5400
55             3.5455
60             3.4667
66             3.4849
72             3.4514
79             3.4557
86             3.4535
94             3.4362
103            3.4319
113            3.4160
124            3.6533
136            3.6219
149            3.5926
163            3.5861
179            3.5435
196            3.5120
215            3.5256
236            3.4916
259            3.4904
284            3.4435
312            3.4308
343            3.4299
377            3.4210
414            3.3979
455            3.3930
500            3.3817
550            3.3917
605            3.3290
665            3.3486
731            3.3695
804            3.3657
884            3.3609
972            3.2943


More information about the gmp-devel mailing list