# GMP used during 3 and a half years to solve MIT's LCS35

Marco Bodrato bodrato at mail.dm.unipi.it
Sat Dec 14 16:15:10 UTC 2019

```Ciao,

Il 2019-05-02 10:09 paul zimmermann ha scritto (on gmp-discuss):
>> That's why a few weeks ago on this same list I suggested:
>> "What about even sparser exponents? Should we use 2*popcount(n)
>> size_in_bits(n) to estimate the best window size? They should be
>> almost
>> equivalent on average."
>>
>> [ https://gmplib.org/list-archives/gmp-discuss/2019-April/006310.html
>> ]

> I support the (clever) idea of using 2*popcount(exponent).
> Could you benchmark this against the current version, on
> dense and sparse exponents?

I tried, with the attached code (replace tune/speed-ext.c, then cd
tune;make speed-ext). It contains a copy of the current
mpn/generic/powm.c code (and mpz/powm.c) with an additional flag and a
branch:

if (flag != 0)
windowsize = win_size (mpn_popcount (ep, en) * 2 - 1);
else
windowsize = win_size (ebi);

This way we can compare both ways to estimate the window size.
Unfortunately, there are large fluctuations while measuring this
function, for each size I measure three _bs and three _pc (bitsize vs.
popcount).

I used the command-line

for i in 0 1 2 4 16 256; do tune/speed-ext -rs1-260 -f2 mpz_powm_bs.\$i
mpz_powm_pc.\$i mpz_powm_bs.\$i mpz_powm_pc.\$i mpz_powm_bs.\$i
mpz_powm_pc.\$i; done

Because of the "-r", the first column is the time used, the other
columns contain the ratio.

With r-flag=0 (random exponent) the speed are comparable. Half the times
the winner is _bs, half the times the winner is _pc:

overhead 0.000000002 secs, precision 10000 units of 2.86e-10 secs, CPU
freq 3500.09 MHz
_bs.0       _pc.0     _bs.0     _pc.0     _bs.0     _pc.0
1     0.000000849    0.9973    0.9954    0.9968   #0.9929    0.9951
2     0.000002583    1.1107    0.9936    0.9959    0.9868   #0.9862
4     0.000013473    0.9951    0.9952   #0.9947    0.9961    0.9947
8     0.000070850    0.9983    0.9987    0.9978    0.9981   #0.9977
16    0.000481797    0.9624    0.9554   #0.9161    0.9602    0.9537
32    0.003307878    0.9979    0.9971    0.9946    0.9998   #0.9780
64    0.021502000    1.0013    0.9979    0.9972   #0.9960    0.9973
128  #0.126749000    1.0036    1.0104    1.0131    1.0059    1.0032
256   0.738998000    1.0001    0.9961    0.9995   #0.9953    1.0008

With r-flag=1 (exponent is set to binary 111...111), popcount may
suggest a larger window-size, but again no clear winner:

overhead 0.000000002 secs, precision 10000 units of 2.86e-10 secs, CPU
freq 3500.09 MHz
_bs.1       _pc.1     _bs.1     _pc.1     _bs.1     _pc.1
1     0.000000878    0.9967    1.0014   #0.9921    1.0016    0.9978
2     0.000002970    1.0053    0.9846    0.9871   #0.9829    0.9949
4     0.000013400    0.9995    0.9977    0.9973   #0.9973    0.9990
8     0.000071776    0.9948    0.9989    0.9950    0.9998   #0.9947
16    0.000475079    0.9701    0.9993   #0.9605    0.9627    0.9691
32    0.003303542   #0.9968    1.0012    1.0083    1.0123    1.0168
64    0.021803000    1.0093    1.0117    0.9971    1.0008   #0.9928
128  #0.128225000    1.0047    1.0042    1.0129    1.0018    1.0083
256   0.746284000    1.0051   #0.9977    1.0006    0.9988    1.0058

With sparser exponents (with r-flag=2, half of the exponent is random
data, half is 0; with r-flag=4, one fourth of the exponent is random
data; and so on...), popcount suggests a smaller window-size, and seems
effective:

overhead 0.000000002 secs, precision 10000 units of 2.86e-10 secs, CPU
freq 3500.09 MHz
_bs.4       _pc.4     _bs.4     _pc.4     _bs.4     _pc.4
1     0.000000765    0.9845    0.9753    0.9818   #0.9736    0.9842
2     0.000002285    0.9839    1.0016    0.9814    1.0022   #0.9810
4     0.000011773    0.9941    1.0026   #0.9820    1.0001    0.9988
8     0.000061283   #0.9884    0.9975    0.9889    1.9493    1.9487
16    0.000394527    1.0343    1.0565   #0.9918    0.9996    1.0378
32    0.002986944    0.9758    0.9840   #0.9663    0.9794    0.9669
64   #0.019095000    1.0025    1.0098    1.0096    1.0194    1.0104
128   0.116289000    0.9958    0.9983    0.9941    0.9994   #0.9933
256   0.681213000    0.9912    0.9948   #0.9902    1.0015    0.9905
overhead 0.000000002 secs, precision 10000 units of 2.86e-10 secs, CPU
freq 3500.09 MHz
_bs.16      _pc.16    _bs.16    _pc.16    _bs.16    _pc.16
1     0.000000729    0.9875    0.9652   #0.9515    0.9536    0.9849
2     0.000002228   #0.9450    0.9953    0.9475    0.9948    0.9477
4     0.000011343   #0.9477    0.9952    0.9480    0.9943    0.9518
8     0.000058805    0.9784    0.9984    0.9760    0.9976   #0.9759
16    0.000381547   #0.9784    1.0186    0.9912    1.0007    0.9927
32    0.002858413   #0.9695    0.9948    0.9793    1.0029    0.9706
64    0.019123000    0.9794    0.9918   #0.9716    0.9963    0.9876
128   0.113480000    0.9892    0.9974   #0.9878    1.0043    0.9907
256   0.668013000    0.9845    0.9979    0.9848    0.9980   #0.9814
overhead 0.000000002 secs, precision 10000 units of 2.86e-10 secs, CPU
freq 3500.09 MHz
_bs.256     _pc.256   _bs.256   _pc.256   _bs.256   _pc.256
1     0.000000730    0.9886    0.9601    0.9521    0.9610   #0.9442
2     0.000002242    0.9355    0.9983   #0.9352    0.9946    0.9369
4     0.000011123    0.9416    1.0002    0.9401    0.9970   #0.9390
8     0.000057803    0.9670    1.0013   #0.9665    1.0005    0.9669
16    0.000378739   #0.9648    0.9993    0.9671    1.0213    0.9670
32    0.002830238   #0.9640    1.0034    0.9669    1.0006    0.9765
64    0.018543000    1.0168    1.0207    1.0020    1.0169   #0.9857
128   0.112874000    0.9867    1.0010   #0.9861    1.0119    0.9893
256   0.664346000   #0.9749    0.9956    0.9766    0.9942    0.9789

I mean, the gain is detectable only for very sparse exponents.

Ĝis,
m
-------------- next part --------------
A non-text attachment was scrubbed...
Name: speed-ext.c
Type: text/x-c
Size: 21609 bytes
Desc: not available
URL: <https://gmplib.org/list-archives/gmp-devel/attachments/20191214/8da1d39b/attachment-0001.bin>
```