Reflections on mulmod_bnm1

Wed Dec 9 11:11:54 CET 2009

Niels,

> But in this case, the top limb is zero almost always, and one only when
> a = B^n mod B^n+1. So maybe it's not good to always include that
> usually-zero high limb to the multiplication calls.
>
> I'm tempted to add special handling (after all, multiplication by B^n =
> - 1 (mod B^n + 1) is unusuallly cheap). It's hard to motivate from a
> performance point of view, but if it makes the handling of the common
> case simpler, it might still be a good idea.

Input of basecases are semi-normalised. The top limb equal to one does not
imply that all other limbs are zero. This also means that the top limb is
not "almost always" zero. Current handling of basecases avoids strict
normalization.

> (And talking about unusual cases, do the current tests exercise
> borderline cases where inputs or products equal -1,0, or +1 modulo the
> various B^n ± 1 involved?)

Yes, it does.

> I'm not so bothered by MNP_COPY in unlikely cases, but I don't like the
> zero-padding of inputs (MPN_ZERO). And I think it is unfortunate if for
> some small inputs mulmod_bmn1 is significantly slower than unbalanced
> mpn_mul + mpn_add. Say, an = rn = 20, bn = 12, or rn = 20, an = bn = 15
> (and threshold > 20).

In both cases you mention bn > rn/2, this means: no copy, no padding.

Say an = rn = 40, bn = 12... a 40/2+1-12= 9 limbs padding for bc_bnp1
Say rn = 40, an = bn = 15... 6 limbs padding for both... but I call this a
"misuse" :-P

> A compromize could be to have the basecase use an = rn >= bn. This could
> be justified by the assumption that the padding needed for the larger
> input should be pretty small, while the smaller input is allowed to be
> of arbitrarily size.

If we do not ALLOW_MISUSE, padding should never be needed for an.

Regards,
Marco

-- 
http://bodrato.it/