Code_Submition_(_Improvement_for_mpn_mul_fft_)
Tommy Färnqvist
tof at kth.se
Wed Oct 27 18:00:17 CEST 2004
Hello,
Below follows timing measurements for the fft code currently in head and
Jeunder Yus fft code. Data is from an Opteron 1400 Mhz (n is operand size
in 64 bit limbs):
n head new
32 0.000097 0.000040
48 0.000104 0.000046
64 0.000093 0.000066
96 0.000127 0.000090
128 0.000160 0.000109
192 0.000174 0.000169
256 0.000250 0.000239
384 0.000356 0.000382
512 0.000485 0.000519
768 0.000774 0.000822
1024 0.001053 0.001128
1536 0.001694 0.001848
2048 0.002258 0.002658
3072 0.003870 0.004411
4096 0.005130 0.006004
6144 0.008726 0.009541
8192 0.011380 0.013644
12288 0.019620 0.021960
16384 0.026235 0.031298
24576 0.044010 0.052577
32768 0.059376 0.073084
49152 0.101576 0.117073
65536 0.131716 0.166197
98304 0.218579 0.272730
131072 0.319987 0.382866
196608 0.487714 0.632859
262144 0.682313 0.863678
393216 1.128396 1.355879
524288 1.601678 1.854415
786432 2.511493 2.931337
1048576 3.517764 4.049518
As you can see Paul Zimmerman's code is faster already for quite small
operands. This is probably due to the new trick included in head. (It
involves doing two shorter transforms and combining them via CRT.) I
believe this trick could be applied to Jeunder Yus code as well. This
should certainly be investigated. For this purpose I attach mul_fft.c from
head.
Best regards,
Tommy Färnqvist
P.S. I would recommend changing line 712 in Jeunder Yus code:
if (Ap[i][nprime - 1] & (1 << (BITS_PER_MP_LIMB - 1)) || Ap[i][nprime])
to:
if (Ap[i][nprime - 1] & (1UL << (BITS_PER_MP_LIMB - 1)) || Ap[i][nprime])
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mul_fft.c
Type: application/octet-stream
Size: 30391 bytes
Desc: mul_fft.c
Url : http://gmplib.org/list-archives/gmp-devel/attachments/20041027/d69da8a1/mul_fft-0001.obj
More information about the gmp-devel
mailing list