fixed-size mpn_mul_n for small n?

Zimmermann Paul Paul.Zimmermann at loria.fr
Sun Feb 12 18:44:25 CET 2012


       Torbjörn,

> Providing special code for many un,vn combinations (as separate
> functions are as part of mpn_mul_basecase) quickly become unmanageable.
> If we want to handle all sizes <= 16 (say) we'll need 136 variants.
> 
> I don't think it makes much sense providing code for just un=vn (except
> that it becomes more manageable...).

I was thinking of applications doing heavy computations in modular arithmetic,
like GMP-ECM, where we only need the case un=vn.

I am trying to optimize the modular multiplications and squarings in GMP-ECM
(where we use Montgomery's reduction). Below are some timings of low-level
functions for different limb sizes up to n=20 on a AMD Phenom(tm) II X2 B55
running at 3Ghz.

Paul

******************
Time in microseconds per call, size=1
mpn_mul_n  = 0.009601
mpn_sqr    = 0.007200
mpn_redc_1 = 0.010401
mpn_redc_2 = 0.013201
mulredc    = 0.004800
mul+redc_1 = 0.019601
mul+redc_2 = 0.022801
mul+redc3  = 0.017201
sqr+redc_1 = 0.018401
sqr+redc_2 = 0.020401
sqr+redc3  = 0.014801
mulredc1   = 0.004800
******************
Time in microseconds per call, size=2
mpn_mul_n  = 0.013601
mpn_sqr    = 0.009601
mpn_redc_1 = 0.017601
mpn_redc_2 = 0.020001
mulredc    = 0.018401
mul+redc_1 = 0.031202
mul+redc_2 = 0.032802
mul+redc3  = 0.032002
sqr+redc_1 = 0.031202
sqr+redc_2 = 0.032802
sqr+redc3  = 0.028002
mulredc1   = 0.007600
******************
Time in microseconds per call, size=3
mpn_mul_n  = 0.018001
mpn_sqr    = 0.013201
mpn_redc_1 = 0.026402
mpn_redc_2 = 0.031202
mulredc    = 0.028802
mul+redc_1 = 0.046803
mul+redc_2 = 0.050403
mul+redc3  = 0.048003
sqr+redc_1 = 0.039602
sqr+redc_2 = 0.045603
sqr+redc3  = 0.040802
mulredc1   = 0.010001
******************
Time in microseconds per call, size=4
mpn_mul_n  = 0.024002
mpn_sqr    = 0.017601
mpn_redc_1 = 0.038402
mpn_redc_2 = 0.036802
mulredc    = 0.046403
mul+redc_1 = 0.062404
mul+redc_2 = 0.062404
mul+redc3  = 0.064004
sqr+redc_1 = 0.056004
sqr+redc_2 = 0.056004
sqr+redc3  = 0.056004
mulredc1   = 0.011601
******************
Time in microseconds per call, size=5
mpn_mul_n  = 0.034002
mpn_sqr    = 0.028002
mpn_redc_1 = 0.050003
mpn_redc_2 = 0.056003
mulredc    = 0.066004
mul+redc_1 = 0.084005
mul+redc_2 = 0.088005
mul+redc3  = 0.092005
sqr+redc_1 = 0.078005
sqr+redc_2 = 0.082005
sqr+redc3  = 0.084005
mulredc1   = 0.013601
******************
Time in microseconds per call, size=6
mpn_mul_n  = 0.040802
mpn_sqr    = 0.036002
mpn_redc_1 = 0.064804
mpn_redc_2 = 0.064804
mulredc    = 0.091205
mul+redc_1 = 0.108007
mul+redc_2 = 0.108007
mul+redc3  = 0.117607
sqr+redc_1 = 0.100806
sqr+redc_2 = 0.098407
sqr+redc3  = 0.110407
mulredc1   = 0.014801
******************
Time in microseconds per call, size=7
mpn_mul_n  = 0.056004
mpn_sqr    = 0.042003
mpn_redc_1 = 0.089606
mpn_redc_2 = 0.086806
mulredc    = 0.120408
mul+redc_1 = 0.145609
mul+redc_2 = 0.142808
mul+redc3  = 0.148410
sqr+redc_1 = 0.128808
sqr+redc_2 = 0.128808
sqr+redc3  = 0.131608
mulredc1   = 0.016801
******************
Time in microseconds per call, size=8
mpn_mul_n  = 0.067204
mpn_sqr    = 0.051203
mpn_redc_1 = 0.102406
mpn_redc_2 = 0.102406
mulredc    = 0.153610
mul+redc_1 = 0.169610
mul+redc_2 = 0.163210
mul+redc3  = 0.179211
sqr+redc_1 = 0.153610
sqr+redc_2 = 0.147210
sqr+redc3  = 0.163210
mulredc1   = 0.018401
******************
Time in microseconds per call, size=9
mpn_mul_n  = 0.090005
mpn_sqr    = 0.061204
mpn_redc_1 = 0.126008
mpn_redc_2 = 0.122408
mulredc    = 0.190812
mul+redc_1 = 0.212414
mul+redc_2 = 0.219614
mul+redc3  = 0.219614
sqr+redc_1 = 0.183612
sqr+redc_2 = 0.187212
sqr+redc3  = 0.194412
mulredc1   = 0.020001
******************
Time in microseconds per call, size=10
mpn_mul_n  = 0.100006
mpn_sqr    = 0.072005
mpn_redc_1 = 0.140008
mpn_redc_2 = 0.144009
mulredc    = 0.232015
mul+redc_1 = 0.240015
mul+redc_2 = 0.248015
mul+redc3  = 0.264017
sqr+redc_1 = 0.216013
sqr+redc_2 = 0.216014
sqr+redc3  = 0.232014
mulredc1   = 0.022001
******************
Time in microseconds per call, size=11
mpn_mul_n  = 0.123208
mpn_sqr    = 0.079206
mpn_redc_1 = 0.162810
mpn_redc_2 = 0.167210
mulredc    = 0.277218
mul+redc_1 = 0.281618
mul+redc_2 = 0.303619
mul+redc3  = 0.303620
sqr+redc_1 = 0.246416
sqr+redc_2 = 0.246416
sqr+redc3  = 0.264017
mulredc1   = 0.023601
******************
Time in microseconds per call, size=12
mpn_mul_n  = 0.139210
mpn_sqr    = 0.096006
mpn_redc_1 = 0.192012
mpn_redc_2 = 0.182411
mulredc    = 0.326421
mul+redc_1 = 0.331221
mul+redc_2 = 0.321621
mul+redc3  = 0.355223
sqr+redc_1 = 0.283217
sqr+redc_2 = 0.283218
sqr+redc3  = 0.316821
mulredc1   = 0.024802
******************
Time in microseconds per call, size=13
mpn_mul_n  = 0.171611
mpn_sqr    = 0.109208
mpn_redc_1 = 0.213213
mpn_redc_2 = 0.218413
mulredc    = 0.379625
mul+redc_1 = 0.384824
mul+redc_2 = 0.390025
mul+redc3  = 0.421226
sqr+redc_1 = 0.327621
sqr+redc_2 = 0.327621
sqr+redc3  = 0.353622
mulredc1   = 0.026802
******************
Time in microseconds per call, size=14
mpn_mul_n  = 0.184813
mpn_sqr    = 0.123207
mpn_redc_1 = 0.229614
mpn_redc_2 = 0.229616
mulredc    = 0.431227
mul+redc_1 = 0.414426
mul+redc_2 = 0.420027
mul+redc3  = 0.464830
sqr+redc_1 = 0.352823
sqr+redc_2 = 0.352821
sqr+redc3  = 0.392026
mulredc1   = 0.029202
******************
Time in microseconds per call, size=15
mpn_mul_n  = 0.204014
mpn_sqr    = 0.138008
mpn_redc_1 = 0.270018
mpn_redc_2 = 0.264017
mulredc    = 0.498030
mul+redc_1 = 0.474030
mul+redc_2 = 0.492032
mul+redc3  = 0.522032
sqr+redc_1 = 0.402026
sqr+redc_2 = 0.408026
sqr+redc3  = 0.456029
mulredc1   = 0.030002
******************
Time in microseconds per call, size=16
mpn_mul_n  = 0.236814
mpn_sqr    = 0.153610
mpn_redc_1 = 0.307219
mpn_redc_2 = 0.294419
mulredc    = 0.556834
mul+redc_1 = 0.537634
mul+redc_2 = 0.524834
mul+redc3  = 0.582437
sqr+redc_1 = 0.454427
sqr+redc_2 = 0.448029
sqr+redc3  = 0.505632
mulredc1   = 0.031602
******************
Time in microseconds per call, size=17
mpn_mul_n  = 0.278819
mpn_sqr    = 0.163210
mpn_redc_1 = 0.333221
mpn_redc_2 = 0.326421
mulredc    = 0.632439
mul+redc_1 = 0.605238
mul+redc_2 = 0.612039
mul+redc3  = 0.659641
sqr+redc_1 = 0.503233
sqr+redc_2 = 0.489631
sqr+redc3  = 0.564434
mulredc1   = 0.034002
******************
Time in microseconds per call, size=18
mpn_mul_n  = 0.295218
mpn_sqr    = 0.187211
mpn_redc_1 = 0.352824
mpn_redc_2 = 0.352822
mulredc    = 0.705644
mul+redc_1 = 0.648042
mul+redc_2 = 0.640840
mul+redc3  = 0.734448
sqr+redc_1 = 0.540033
sqr+redc_2 = 0.540035
sqr+redc3  = 0.626440
mulredc1   = 0.035602
******************
Time in microseconds per call, size=19
mpn_mul_n  = 0.319221
mpn_sqr    = 0.197612
mpn_redc_1 = 0.395225
mpn_redc_2 = 0.418027
mulredc    = 0.782851
mul+redc_1 = 0.714445
mul+redc_2 = 0.729647
mul+redc3  = 0.805653
sqr+redc_1 = 0.608039
sqr+redc_2 = 0.623239
sqr+redc3  = 0.691645
mulredc1   = 0.036802
******************
Time in microseconds per call, size=20
mpn_mul_n  = 0.360022
mpn_sqr    = 0.224014
mpn_redc_1 = 0.440028
mpn_redc_2 = 0.440028
mulredc    = 0.864054
mul+redc_1 = 0.792048
mul+redc_2 = 0.792050
mul+redc3  = 0.880056
sqr+redc_1 = 0.664040
sqr+redc_2 = 0.656042
sqr+redc3  = 0.760048
mulredc1   = 0.038402


More information about the gmp-devel mailing list