GMP used during 3 and a half years to solve MIT's LCS35
Marco Bodrato
bodrato at mail.dm.unipi.it
Sat Dec 14 16:15:10 UTC 2019
Ciao,
Il 2019-05-02 10:09 paul zimmermann ha scritto (on gmp-discuss):
>> That's why a few weeks ago on this same list I suggested:
>> "What about even sparser exponents? Should we use 2*popcount(n)
>> instead of
>> size_in_bits(n) to estimate the best window size? They should be
>> almost
>> equivalent on average."
>>
>> [ https://gmplib.org/list-archives/gmp-discuss/2019-April/006310.html
>> ]
> I support the (clever) idea of using 2*popcount(exponent).
> Could you benchmark this against the current version, on
> dense and sparse exponents?
I tried, with the attached code (replace tune/speed-ext.c, then cd
tune;make speed-ext). It contains a copy of the current
mpn/generic/powm.c code (and mpz/powm.c) with an additional flag and a
branch:
if (flag != 0)
windowsize = win_size (mpn_popcount (ep, en) * 2 - 1);
else
windowsize = win_size (ebi);
This way we can compare both ways to estimate the window size.
Unfortunately, there are large fluctuations while measuring this
function, for each size I measure three _bs and three _pc (bitsize vs.
popcount).
I used the command-line
for i in 0 1 2 4 16 256; do tune/speed-ext -rs1-260 -f2 mpz_powm_bs.$i
mpz_powm_pc.$i mpz_powm_bs.$i mpz_powm_pc.$i mpz_powm_bs.$i
mpz_powm_pc.$i; done
Because of the "-r", the first column is the time used, the other
columns contain the ratio.
With r-flag=0 (random exponent) the speed are comparable. Half the times
the winner is _bs, half the times the winner is _pc:
overhead 0.000000002 secs, precision 10000 units of 2.86e-10 secs, CPU
freq 3500.09 MHz
_bs.0 _pc.0 _bs.0 _pc.0 _bs.0 _pc.0
1 0.000000849 0.9973 0.9954 0.9968 #0.9929 0.9951
2 0.000002583 1.1107 0.9936 0.9959 0.9868 #0.9862
4 0.000013473 0.9951 0.9952 #0.9947 0.9961 0.9947
8 0.000070850 0.9983 0.9987 0.9978 0.9981 #0.9977
16 0.000481797 0.9624 0.9554 #0.9161 0.9602 0.9537
32 0.003307878 0.9979 0.9971 0.9946 0.9998 #0.9780
64 0.021502000 1.0013 0.9979 0.9972 #0.9960 0.9973
128 #0.126749000 1.0036 1.0104 1.0131 1.0059 1.0032
256 0.738998000 1.0001 0.9961 0.9995 #0.9953 1.0008
With r-flag=1 (exponent is set to binary 111...111), popcount may
suggest a larger window-size, but again no clear winner:
overhead 0.000000002 secs, precision 10000 units of 2.86e-10 secs, CPU
freq 3500.09 MHz
_bs.1 _pc.1 _bs.1 _pc.1 _bs.1 _pc.1
1 0.000000878 0.9967 1.0014 #0.9921 1.0016 0.9978
2 0.000002970 1.0053 0.9846 0.9871 #0.9829 0.9949
4 0.000013400 0.9995 0.9977 0.9973 #0.9973 0.9990
8 0.000071776 0.9948 0.9989 0.9950 0.9998 #0.9947
16 0.000475079 0.9701 0.9993 #0.9605 0.9627 0.9691
32 0.003303542 #0.9968 1.0012 1.0083 1.0123 1.0168
64 0.021803000 1.0093 1.0117 0.9971 1.0008 #0.9928
128 #0.128225000 1.0047 1.0042 1.0129 1.0018 1.0083
256 0.746284000 1.0051 #0.9977 1.0006 0.9988 1.0058
With sparser exponents (with r-flag=2, half of the exponent is random
data, half is 0; with r-flag=4, one fourth of the exponent is random
data; and so on...), popcount suggests a smaller window-size, and seems
effective:
overhead 0.000000002 secs, precision 10000 units of 2.86e-10 secs, CPU
freq 3500.09 MHz
_bs.4 _pc.4 _bs.4 _pc.4 _bs.4 _pc.4
1 0.000000765 0.9845 0.9753 0.9818 #0.9736 0.9842
2 0.000002285 0.9839 1.0016 0.9814 1.0022 #0.9810
4 0.000011773 0.9941 1.0026 #0.9820 1.0001 0.9988
8 0.000061283 #0.9884 0.9975 0.9889 1.9493 1.9487
16 0.000394527 1.0343 1.0565 #0.9918 0.9996 1.0378
32 0.002986944 0.9758 0.9840 #0.9663 0.9794 0.9669
64 #0.019095000 1.0025 1.0098 1.0096 1.0194 1.0104
128 0.116289000 0.9958 0.9983 0.9941 0.9994 #0.9933
256 0.681213000 0.9912 0.9948 #0.9902 1.0015 0.9905
overhead 0.000000002 secs, precision 10000 units of 2.86e-10 secs, CPU
freq 3500.09 MHz
_bs.16 _pc.16 _bs.16 _pc.16 _bs.16 _pc.16
1 0.000000729 0.9875 0.9652 #0.9515 0.9536 0.9849
2 0.000002228 #0.9450 0.9953 0.9475 0.9948 0.9477
4 0.000011343 #0.9477 0.9952 0.9480 0.9943 0.9518
8 0.000058805 0.9784 0.9984 0.9760 0.9976 #0.9759
16 0.000381547 #0.9784 1.0186 0.9912 1.0007 0.9927
32 0.002858413 #0.9695 0.9948 0.9793 1.0029 0.9706
64 0.019123000 0.9794 0.9918 #0.9716 0.9963 0.9876
128 0.113480000 0.9892 0.9974 #0.9878 1.0043 0.9907
256 0.668013000 0.9845 0.9979 0.9848 0.9980 #0.9814
overhead 0.000000002 secs, precision 10000 units of 2.86e-10 secs, CPU
freq 3500.09 MHz
_bs.256 _pc.256 _bs.256 _pc.256 _bs.256 _pc.256
1 0.000000730 0.9886 0.9601 0.9521 0.9610 #0.9442
2 0.000002242 0.9355 0.9983 #0.9352 0.9946 0.9369
4 0.000011123 0.9416 1.0002 0.9401 0.9970 #0.9390
8 0.000057803 0.9670 1.0013 #0.9665 1.0005 0.9669
16 0.000378739 #0.9648 0.9993 0.9671 1.0213 0.9670
32 0.002830238 #0.9640 1.0034 0.9669 1.0006 0.9765
64 0.018543000 1.0168 1.0207 1.0020 1.0169 #0.9857
128 0.112874000 0.9867 1.0010 #0.9861 1.0119 0.9893
256 0.664346000 #0.9749 0.9956 0.9766 0.9942 0.9789
I mean, the gain is detectable only for very sparse exponents.
Ĝis,
m
-------------- next part --------------
A non-text attachment was scrubbed...
Name: speed-ext.c
Type: text/x-c
Size: 21609 bytes
Desc: not available
URL: <https://gmplib.org/list-archives/gmp-devel/attachments/20191214/8da1d39b/attachment-0001.bin>
More information about the gmp-devel
mailing list