Mod operation

Wed Nov 25 13:36:14 CET 2009

       Niels,

> The new dvision functions come in the flavors div_qr, bdiv_qr, div_q,
> bdiv_q. At least for bdiv, bdiv_q is more efficient than bdiv_qr (for
> truncating division, divappr_q is faster than div_qr, but correct
> division as required for div_q is more subtle).
> 
> What about div_r and bdiv_r, returning remainder only? One reason that
> is interesting is that bdiv_r is essentially redc. When discussed
> earlier, I think the conclusion has been that you can only save a
> little storage if you don't care about the remainder, but no
> computation.

I guess you mean "you don't care about the quotient" here, as Torbjörn
pointed out.

> One interesting exception to that conclusion is the mod_1 algorithm,
> based on a suggestion from Montgomery, where one precomputes B^k mod
> m and use that to iteratively reduce the inputs mod m B^j.
> 
> I wonder if that idea can be applied to general division. Say we want
> to compute x mod m, where x = x0 + x1 B + x2 B^2 + x3 B^3 and m = m0 +
> m1 B (here, the base B represents n limbs, so its a 4n by 2n limb
> operation).
> 
> Assume we can precompute for free any m-dependent values we feel like.
> Let m3 = B^3 mod m. Then clearly
> 
>   x = x0 + x1 B + x2 B^2 + x3 * m3 = x (mod m)
> 
> So by one 2n x n multiplication, we reduce input size from 4n to 3n.
> There remains one 3n / 2n division, which can be done using one n x n
> multiply to get the quotient, and an 2n x n to get the remainder.
> 
> In total, we have two 2n x n and one n x n. This can be compared to
> Barrett that uses two 2n x 2n, or block wise Barrett, which uses two
> 2n x n and two n x n, or BZ, which uses two 2n / n and one n x n.

yes, the algorithm you describe was published by Hasenplaugh, Gaubatz and
Gopal in the proceedings of Arith-18. See also the end of page 39 in [1].
But in my opinion this is just another view of Svoboda's algorithm: in
Svoboda's algorithm you precompute a multiple of m which is just over
B^3, i.e., k*m = B^3 + r, whereas in the algorithm of Hasenplaugh, Gaubatz,
and Gopal (or should we attribute it to Montgomery?) you compute
B^3 mod m, i.e., you get m3 = -r (mod m).

> Questions:
> 
> 1. Does this make sense? It's clearly a win over Barrett, at least if
>    multiplies are done by schoolbook, and it's a win over BZ, assuming
>    that 2n/n is slower than n x n. It's less clear what the numbers
>    should be above schoolbook range, in particular since unbalanced
>    products 2n x n are of central importance.

yes it makes sense, in the Karatsuba range you get 5/3M(n) instead of 2M(n).
But it requires a precomputation. You can even do more precomputations and
iterate this "folding" idea, see the article of Hasenplaugh, Gaubatz and Gopal.

> 2. Can it be adapted to 2-adic division (bdiv)? What's unclear to me
>    is what the precomputed value m3 corresponds to in a 2-adic
>    setting.
> 
>    For a start, is the p-adic remainder operation even linear (for a
>    fixed modulo m)?

yes, see Algorithm MontgomerySvoboda2 in [1] (page 79).

> Regardss,
> /Niels

Paul

[1] Modern Computer Arithmetic, Richard Brent and Paul Zimmermann,
version 0.4, November 2009, http:°www.loria.fr/~zimmerma/mca/mca-0.4.pdf.