nisse at lysator.liu.se
Tue May 19 16:35:50 CEST 2009
Paul Zimmermann <Paul.Zimmermann at loria.fr> writes:
> what does happen currently (say in GMP 4.3.0) when n=2m? Does GMP perform
> two FFTs of total size 2m, or one single FFT of total size 3m? Note that
> this case happens in several applications (in particular in the half-gcd).
The way I read the current code, it's schoolbok if numbers are small
enough (m < KARATSUBA_THRESHOLD) and one single fft if the numbers are
large enough (m > 2 FFT_THRESSOLD/3).
In the middle range, it's mpn_toom42_mul, which will then never
recurse to FFT.
I wouldn't be surprised if it would be an improvement to use toom42
also in the FFT range, but it's not at all clear to me what's the
best strategy, in particular if invariance is taken into account
For toom42, we have a degree 3 polynomial times a degree 1 polynomial,
both evaluated in 5 points. In the FFT range, if we evaluate before
transform, we get 10 forward transforms (and 5 inverse transforms). It
should be a win to transform all 6 coefficients, and do the evaluation
on the transformed coefficients, saving 4 of the forward transforms or
26% of the transform work. All these transforms are of size m.
I tried a similar trick in my code for Strassen-multiplicationn of 2x2
matrices, where 14 forward and 7 inverse transforms are replaced by 8
forward and 4 inverse (43 %). In this case, IIRC the saving in real
execution time was modest, at least in the gcd benchmark. I don't
think I have any benchmark for only the matrix multiplication.
More information about the gmp-devel