Reflections on mulmod_bnm1
tg at gmplib.org
Wed Dec 9 08:58:05 CET 2009
bodrato at mail.dm.unipi.it writes:
> It makes sense to me to at add scratch parameter tp. One might want to
> specify that rp and tp may point to the same area; that will save
> storage for all callers that compute rp into a scratch area anyway.
> Good idea.
I do not agree with Torbjorn. It's not a good idea, it's a Great Idea :-D
This is not an argument...
Maybe we can apply the same strategy to mullow as well?
> Is it easy to check if a given n works with FFT? Then one can just check
> for this, like
> I doubt this is a problem in practice, but we should keep it in mind.
> We'll hit an ASSERT_ALWAYS if there is indeed a problem.
This can be a problem in practice! I inserted a very naive code... and
forgot to ask if it is correct.
Currently mpn_mulmod_bnm1 asks for k = mpn_fft_best_k (n,0),
then adjusts the value k = min(k, count_trailing_zeros(n)),
if (k >= FFT_FIRST_K) mpn_mul_fft (r, n, a, an, b, bn, k);
Probably this is not optimal... but is it correct?
Why don't you call mpn_fft_best_k + mpn_fft_next_size?
The ASSERT_ALWAYS (mpn_fft_next_size (n, k) == n) at the beginning of
mul_fft suggest it is, but please confirm.
If it is correct, then we have an easy way "to check if a given n works
with FFT": (n == mpn_fft_next_size (n, FFT_FIRST_K))
"Works", yes, but with a suboptimal k. (I.e., too short transform, too
large coefficients. It might be more like Toom...)
One more warning: in the mul_fft call above it may happen that an < bn.
A fast inspection of the code suggests that this is not a problem with
current implementation... I hope it will not be for any in the future!
I know an advanced algorithm for resolving such a problem if it occurs.
It needs a few cycles, but since we're about to spend tens of thousands
of cycles, that's not too much overhead.
More information about the gmp-devel