mul_fft

Wed Jun 16 07:14:25 UTC 2021

       Hi,

with Samuel Vivien (in cc) we noticed two things related to the FFT code:

1) the use of mpn_add_n_sub_n is not activated by default in mul_fft.c.
   It might give a small speedup in some cases.

2) the previous code to multiply integers (mpn_mul_fft_full) which did
   two multiplications mod 2^N1+1 and 2^N2+1 has been replaced by a new
   code which computes 2^(2N)-1 for 2N ~ N1+N2, using the fact that
   2^(2N)-1=(2^N+1)(2^(N/2)+1)(2^(N/4)+1)...(2^(N/2^k)+1)(2^(N/2^k)-1).
   However that new code is not always faster, for example on an Intel
   Core i5-4590 with gcc 10.2.1:

Vanilla GMP 6.2.1:
$ ./speed -s 1000000,2000000,5000000,10000000 mpn_mul_n mpn_mul_n mpn_mul_n
            mpn_mul_n     mpn_mul_n     mpn_mul_n
1000000    0.447985000   0.452133000  #0.447968000
2000000   #0.949739000   0.949996000   0.956713000
5000000   #2.669156000   2.678638000   2.677642000
10000000   #5.540204000   5.547306000   5.547152000

With --enable-old-fft-full (without WANT_ADDSUB):
1000000    0.448750000   0.451685000  #0.447571000
2000000   #0.941488000   0.942339000   0.947603000
5000000   #2.668528000   2.675208000   2.669147000
10000000    5.543246000   5.580915000  #5.538155000

Best regards,
Paul