fft interface

Niels Möller nisse at lysator.liu.se
Wed Oct 17 11:57:05 CEST 2012


Zimmermann Paul <Paul.Zimmermann at loria.fr> writes:

> In fact it seems I should compare with mpn_mulmod_bnm1_rounded, which seems to
> use the best fft size greater or equal to the wanted size (speed mpn_mul_fft
> does this automatically):

That seems right, I wasn't ware of how this is done in speed. For good
performance, one should usually call mulmod_bnm1 only with result sizes
chosen by mpn_mulmod_bnm1_next_size.

> frite% ./speed -s 1000000,2000000,5000000,10000000 mpn_mul_fft mpn_mulmod_bnm1_rounded
> overhead 0.000000002 secs, precision 10000 units of 3.33e-10 secs, CPU freq 3000.00 MHz
>           mpn_mul_fft mpn_mulmod_bnm1_rounded
> mpn_mul_fft pl=1007616 nl=1000000 ml=1000000
> mpn_mulmod_bnm1: rn=1015808 an=1000000 bn=1000000
> 1000000    0.368023000  #0.304019000
> 2000000    0.780048000  #0.752047000
> 5000000    2.392149000  #2.212138000
> 10000000    4.944309000  #4.828301000

So in this range, mulmod_bnm1 is slightly faster. Here are results for
smaller sizes, compared to plain mpn_mul:

$ ./speed -ocycles-broken -s 10-50000 -f2 -r mpn_mul mpn_mul_fft mpn_mulmod_bnm1_rounded
clock_gettime is 1.000ns accurate
overhead 0.000000006 secs, precision 10000 units of 1.00e-09 secs, CPU freq 1200.00 MHz
              mpn_mul   mpn_mul_fft mpn_mulmod_bnm1_rounded
10       #0.000000457      149.6664        1.0799
20        0.000001641       43.4732       #0.9290
40        0.000005367       16.2329       #0.7590
80        0.000016162        5.5915       #0.6786
160       0.000050389        2.1811       #0.5693
320       0.000129925        1.3104       #0.6261
640       0.000327598        1.1281       #0.6788
1280      0.000917371        0.8377       #0.6074
2560      0.002145590        0.6681       #0.6338
5120      0.006303576        0.5174       #0.4522
10240     0.014602908        0.4694       #0.4342
20480     0.035096926        0.4425       #0.3856
40960     0.079812770        0.4521       #0.3761

It definitely makes the wrap-around trick useful for fairly small sizes.

I didn't expect to see any ratios < 0.5, though. Why is that?

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.


More information about the gmp-devel mailing list