Code_Submition_(_Improvement_for_mpn_mul_fft_)

Tommy Färnqvist tof at kth.se
Wed Oct 27 18:00:17 CEST 2004


Hello,
Below follows timing measurements for the fft code currently in head and
Jeunder Yus fft code. Data is from an Opteron 1400 Mhz (n is operand size
in 64 bit limbs):

   n      head     new
    32 0.000097 0.000040
    48 0.000104 0.000046
    64 0.000093 0.000066
    96 0.000127 0.000090
   128 0.000160 0.000109
   192 0.000174 0.000169
   256 0.000250 0.000239
   384 0.000356 0.000382
   512 0.000485 0.000519
   768 0.000774 0.000822
  1024 0.001053 0.001128
  1536 0.001694 0.001848
  2048 0.002258 0.002658
  3072 0.003870 0.004411
  4096 0.005130 0.006004
  6144 0.008726 0.009541
  8192 0.011380 0.013644
 12288 0.019620 0.021960
 16384 0.026235 0.031298
 24576 0.044010 0.052577
 32768 0.059376 0.073084
 49152 0.101576 0.117073
 65536 0.131716 0.166197
 98304 0.218579 0.272730
131072 0.319987 0.382866
196608 0.487714 0.632859
262144 0.682313 0.863678
393216 1.128396 1.355879
524288 1.601678 1.854415
786432 2.511493 2.931337
1048576 3.517764 4.049518

As you can see Paul Zimmerman's code is faster already for quite small
operands. This is probably due to the new trick included in head. (It
involves doing two shorter transforms and combining them via CRT.) I
believe this trick could be applied to Jeunder Yus code as well. This
should certainly be investigated. For this purpose I attach mul_fft.c from
head.

Best regards,
Tommy Färnqvist

P.S. I would recommend changing line 712 in Jeunder Yus code:
if (Ap[i][nprime - 1] & (1 << (BITS_PER_MP_LIMB - 1)) || Ap[i][nprime])
to:
if (Ap[i][nprime - 1] & (1UL << (BITS_PER_MP_LIMB - 1)) || Ap[i][nprime])

-------------- next part --------------
A non-text attachment was scrubbed...
Name: mul_fft.c
Type: application/octet-stream
Size: 30391 bytes
Desc: mul_fft.c
Url : http://gmplib.org/list-archives/gmp-devel/attachments/20041027/d69da8a1/mul_fft-0001.obj


More information about the gmp-devel mailing list