Reflections on mulmod_bnm1

Wed Dec 9 08:58:05 CET 2009

bodrato at mail.dm.unipi.it writes:

  >   It makes sense to me to at add scratch parameter tp. One might want to
  >   specify that rp and tp may point to the same area; that will save
  >   storage for all callers that compute rp into a scratch area anyway.
  >
  > Good idea.

  I do not agree with Torbjorn. It's not a good idea, it's a Great Idea :-D

This is not an argument...

  Maybe we can apply the same strategy to mullow as well?

Yes!

  >   Is it easy to check if a given n works with FFT? Then one can just check
  >   for this, like

  > I doubt this is a problem in practice, but we should keep it in mind.
  > We'll hit an ASSERT_ALWAYS if there is indeed a problem.

  This can be a problem in practice! I inserted a very naive code... and
  forgot to ask if it is correct.

  Currently mpn_mulmod_bnm1 asks for k = mpn_fft_best_k (n,0),
  then adjusts the value k = min(k, count_trailing_zeros(n)),
  and finally:
   if (k >= FFT_FIRST_K) mpn_mul_fft (r, n, a, an, b, bn, k);

  Probably this is not optimal... but is it correct?

Why don't you call mpn_fft_best_k + mpn_fft_next_size?

  The ASSERT_ALWAYS (mpn_fft_next_size (n, k) == n) at the beginning of
  mul_fft suggest it is, but please confirm.

  If it is correct, then we have an easy way "to check if a given n works
  with FFT": (n == mpn_fft_next_size (n, FFT_FIRST_K))

"Works", yes, but with a suboptimal k.  (I.e., too short transform, too
large coefficients.  It might be more like Toom...)

  One more warning: in the mul_fft call above it may happen that an < bn.
  A fast inspection of the code suggests that this is not a problem with
  current implementation... I hope it will not be for any in the future!

I know an advanced algorithm for resolving such a problem if it occurs.
It needs a few cycles, but since we're about to spend tens of thousands
of cycles, that's not too much overhead.

-- 
Torbjörn