Generalizing Montgomery mod algorithm

Thu Jun 14 21:12:34 CEST 2012

Torbjorn Granlund <tg at gmplib.org> writes:

> I agree this method could make a faster SB mpn_tdiv_r, but I am
> unconvinced that you can get the quotient with fewer operatons than the
> current code.

The method specialized for div_qr_1 did beat the udiv_qrnnd_preinv loop
(at least when comparing the x86_64 code). And the needed quotient
book-keeping should be very similar for larger divisors, at least for
the variant which use a limb + bit multiplier v.

I suspect the main drawbacks compared to current SB div_qr is higher
probability of the bignum adjustment, an O(n^2) cost, and more expensive
precomputation, O(n). None of these were relevant for div_qr_1. And the
main advantage is that the depenency from one addmul_1 to the next
involves no multiplication, an O(n) gain. Marco suggested (in private
mail) that one can predict most of the carries early, and get the
adjustment probability small, but I haven't looked into that.

I should perhaps also mention the special case which got me onto this
track: Moduli where the top limb already is B-1. Then there's no
precomputation, and the adjustment probability is very low (since B^{n}
- M is at most n - 1 limbs). Such moduli are used in several standard
groups specified for diffie-hellman and ecc use. My conclusion for now
is that for such groups, there's no point in using montgomery
multiplication, plain mod using this method should be fast enough.

> You should implement a SB mpn_tdiv_r based on your ideas.  It ought to
> outperform the current code most when the divisor is small and the
> dividend is fairly large.

I'd like to, but it's unlikely to happen very soon.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.