Improvements for mpn_mul_fft

Sun Oct 31 21:24:38 CET 2004

>      ... therefore, non-recursive and cache locality are very important 
> for
>      large n.
>
> I totally agree about the importance of cache locality, but why
> is it important to avoid recursiion?  The actual function call
> overhead should really be minor, considering the operand size.

In Scho"nhage's method, the overhead of recursive call in fft is very small,
because k is very small. The upper bound of k is 11 (refer to the default
FFT_TABLE and mpn_fft_best_k function), therefore the transform length
is small. If k=11, the transform length is 2^k=2048, and the number of
recursive calls is about 2048. Another reason is... we must perform many
function calls for multi-limbs add/sub and shift...etc., therefore the 
overhead
of recursive call is relatively negligible.

In another method, if two operands are size in 2^n limbs, the product is
2^(n+1) limbs, we must perform a length 2^(n+1) transform on each
operand (in 3-prime moduli method, we must perform three such transform
on each operand). If n=20, the number of recursive call for each transform
is about 1048576, and cost several mega limbs of stack (depends on the
number of local variables in fft routine), this is very larger than 
Scho"nhage's
method. In such single limb transform, whole operation can be done without
extra function calls, therefore the overhead of recursive call is relatively 
large.
So I said in such case, non-recursive fft is very important.

Most of iterative fft algorithms are lost in cache locality, this is the 
motive
of six-step fft, and my technique is another simpler way.

> Before we can include your contribution in GMP, we need you to
> sign paperwork to sign over your copyright ownership to the FSF.
> Are you willing to do that?

Okay, I will. :-)

--
Jeunder Yu