Reflections on mulmod_bnm1

Tue Dec 8 21:48:35 CET 2009

Torbjorn Granlund <tg at gmplib.org> writes:

> When the basecase function mpn_bc_mulmod_bnm1 is invoked, it writes 2rn
> limbs to rp.  This is an unexpected interface, it might be far outside
> the result area.  Is this intended?

It looked like a good idea at the time. But I agree it's a poor
interface. mullo (at least some of the implementations) had the same
issue, and in that case we chose an interface that doesn't require extra
space at rp, and instead the function has to allocate whatever it needs.

> Should we perhaps pass and use tp for the mpn_mul_n result,

It makes sense to me to at add scratch parameter tp. One might want to
specify that rp and tp may point to the same area; that will save
storage for all callers that compute rp into a scratch area anyway.

> then let mpn_add (which should BTW be mpn_add_n) write to rp?

I'm not 100% sure it's a good design idea to use a single size parameter
for the basecase functions. I guess the idea here is that mulmod_bnm1 is
useful only when there's at least one split and recursive calls to bnm1
and bnp1. And if one assumes that an >= rn / 2, then the recursive calls
all will have an = bn = rn. (This is of course related to the
ALLOW_MISUSE that Marco described earlier today).

> Perhaps we should do something like this:
[...]
> Else, rounding within 2 * MULMOD_BNM1_THRESHOLD should be to the next
> multiple of 2.
>
> Else, rounding within 4 * MULMOD_BNM1_THRESHOLD should be to the next
> multiple of 4.

Makes sense to me. In principle, one also can do mulmod_bnm1 recursively,
also when rn is odd, assuming that GMP_NUMB_BITS is even. But I'd expect
that the shifting overhead one gets there is worse then rounding rn up.

> (There is a potential error in this algorithm, which is also present in
> the current next function: The non-FFT rounding might bring an n smaller
> than MUL_FFT_MODF_THRESHOLD into the FFT range.  This could then be
> insufficiently rounded for FFT...)

Is it easy to check if a given n works with FFT? Then one can just check
for this, like

      if (BELOW_THRESHOLD (n, MUL_FFT_MODF_THRESHOLD)
          || !valid_fft_size_p(n))
      	mpn_bc_mulmod_bnp1 (xp, ap1, bp1, n);
      else { ... do the fft thing ... }

(then I guess this may also interact with the choice of the fft "k" in
some interesting way).

Regards,
/Niels
-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.