# Reflections on mulmod_bnm1

Torbjorn Granlund tg at gmplib.org
Wed Dec 9 08:58:05 CET 2009

```bodrato at mail.dm.unipi.it writes:

>   It makes sense to me to at add scratch parameter tp. One might want to
>   specify that rp and tp may point to the same area; that will save
>   storage for all callers that compute rp into a scratch area anyway.
>
> Good idea.

I do not agree with Torbjorn. It's not a good idea, it's a Great Idea :-D

This is not an argument...

Maybe we can apply the same strategy to mullow as well?

Yes!

>   Is it easy to check if a given n works with FFT? Then one can just check
>   for this, like

> I doubt this is a problem in practice, but we should keep it in mind.
> We'll hit an ASSERT_ALWAYS if there is indeed a problem.

This can be a problem in practice! I inserted a very naive code... and
forgot to ask if it is correct.

Currently mpn_mulmod_bnm1 asks for k = mpn_fft_best_k (n,0),
then adjusts the value k = min(k, count_trailing_zeros(n)),
and finally:
if (k >= FFT_FIRST_K) mpn_mul_fft (r, n, a, an, b, bn, k);

Probably this is not optimal... but is it correct?

Why don't you call mpn_fft_best_k + mpn_fft_next_size?

The ASSERT_ALWAYS (mpn_fft_next_size (n, k) == n) at the beginning of
mul_fft suggest it is, but please confirm.

If it is correct, then we have an easy way "to check if a given n works
with FFT": (n == mpn_fft_next_size (n, FFT_FIRST_K))

"Works", yes, but with a suboptimal k.  (I.e., too short transform, too
large coefficients.  It might be more like Toom...)

One more warning: in the mul_fft call above it may happen that an < bn.
A fast inspection of the code suggests that this is not a problem with
current implementation... I hope it will not be for any in the future!

I know an advanced algorithm for resolving such a problem if it occurs.
It needs a few cycles, but since we're about to spend tens of thousands
of cycles, that's not too much overhead.

--
Torbjörn
```