Reflections on mulmod_bnm1

Wed Dec 9 10:37:53 CET 2009

bodrato at mail.dm.unipi.it writes:

> At first I wrote basecases with all the three parameters: an, bn, rn. This
> avoids any MPN_COPY, but forces the use of mpn_mul and some logic in the
> _bc, to be repeated all the times, even in the very likely case
> an=bn=rn.

Maybe you're right that this is important for performance. Still, most
of the work is done with fairly large calls to bnp1.

> Moreover, there was a subtile bug: it could happen that bc_mulmod_bnp1 was
> called with (anp < bnp), even if originally we had (an >= bn)... because
> wrapping can give a carry or not... and mpn_mul does not support an<bn.

In general, for functions that don't depend on normalization (i.e.,
almost everything except division), it's usually best not to be so
zealous with normaliation, but allow zero high limb from time to time.

But in this case, the top limb is zero almost always, and one only when
a = B^n mod B^n+1. So maybe it's not good to always include that
usually-zero high limb to the multiplication calls.

I'm tempted to add special handling (after all, multiplication by B^n =
- 1 (mod B^n + 1) is unusuallly cheap). It's hard to motivate from a
performance point of view, but if it makes the handling of the common
case simpler, it might still be a good idea.

(And talking about unusual cases, do the current tests exercise
borderline cases where inputs or products equal -1,0, or +1 modulo the
various B^n ± 1 involved?)

> Finally, logic is moved where we have more control on it (maybe reducing
> it, maybe not), we added some COPY but only in unlikely conditions, we can
> use mul_n instead of mul. In my opinion, the balance is positive.

I'm not so bothered by MNP_COPY in unlikely cases, but I don't like the
zero-padding of inputs (MPN_ZERO). And I think it is unfortunate if for
some small inputs mulmod_bmn1 is significantly slower than unbalanced
mpn_mul + mpn_add. Say, an = rn = 20, bn = 12, or rn = 20, an = bn = 15
(and threshold > 20).

A compromize could be to have the basecase use an = rn >= bn. This could
be justified by the assumption that the padding needed for the larger
input should be pretty small, while the smaller input is allowed to be
of arbitrarily size.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.