New mulmod_bknp1

Marco Bodrato bodrato at mail.dm.unipi.it
Tue Feb 15 11:48:25 CET 2022


Ciao,

I pushed some new functions to compute the product (and square) modulo 
B^nn+1, for small values of the exponent nn. For large values the 
already available fft_mul should be used.

The new functions actually require nn=k*n, where k can be 3 or in {5, 7, 
13, 17}.
Also k=11 can be supported, but it's currently disabled.

I added also tune/speed support, but the problem with the analysis is 
that the function is defined for some values only...

Here are some results:

@shell ~/gmp-repo]$ /var/tmp/bodrato/gmp/hg/build/tune/speed -Cr -s 
75-85\,240-250\,725-735 mpn_mul_n mpn_mulmod_bknp1
overhead 7.15 cycles, precision 10000 units of 2.86e-10 secs, CPU freq 
3500.09 MHz
             mpn_mul_n mpn_mulmod_bknp1
75           159.4800       #0.9715
76          #167.2895           n/a
77          #167.1558        1.0793
78           193.8462       #0.7311
79          #169.8734           n/a
80           182.1500       #0.8597
81           196.0617       #0.7721
82          #170.8780           n/a
83          #170.8916           n/a
84           169.2024       #0.9344
85           188.8235       #0.7898
240          258.7792       #0.7474
241         #254.4274           n/a
242          266.2355       #0.9051
243          260.2798       #0.7414
244         #256.3893           n/a
245          256.8449       #0.8595
246          255.7073       #0.7915
247          257.0324       #0.9904
248         #257.6532           n/a
249          257.2410       #0.7925
250          258.1480       #0.8535
725          378.2538       #0.8652
726          377.4793       #0.8362
727         #376.6836           n/a
728          376.2060       #0.9164
729          379.7462       #0.7682
730          379.1068       #0.8727
731          379.0109       #0.9805
732          377.9754       #0.8285
733         #379.6958           n/a
734         #377.6403           n/a
735          376.0204       #0.8094

As you can see, the new functions are particularly effective, when there 
are many factors "3" in the size...

Integrating this in the current mod_bnm1 is easy, but the effect would 
be to accelerate the operations only for some sizes. This may add 
"noise" to the thresholds system. Some more experiments are needed.

In the meanwhile, the base code will be tested by the great testing 
system of GMP :-)

Ĝis,
m


More information about the gmp-devel mailing list