# Mod operation

Niels Möller nisse at lysator.liu.se
Wed Nov 25 12:33:03 CET 2009

```The new dvision functions come in the flavors div_qr, bdiv_qr, div_q,
bdiv_q. At least for bdiv, bdiv_q is more efficient than bdiv_qr (for
truncating division, divappr_q is faster than div_qr, but correct
division as required for div_q is more subtle).

What about div_r and bdiv_r, returning remainder only? One reason that
is interesting is that bdiv_r is essentially redc. When discussed
earlier, I think the conclusion has been that you can only save a
little storage if you don't care about the remainder, but no
computation.

One interesting exception to that conclusion is the mod_1 algorithm,
based on a suggestion from Montgomery, where one precomputes B^k mod
m and use that to iteratively reduce the inputs mod m B^j.

I wonder if that idea can be applied to general division. Say we want
to compute x mod m, where x = x0 + x1 B + x2 B^2 + x3 B^3 and m = m0 +
m1 B (here, the base B represents n limbs, so its a 4n by 2n limb
operation).

Assume we can precompute for free any m-dependent values we feel like.
Let m3 = B^3 mod m. Then clearly

x = x0 + x1 B + x2 B^2 + x3 * m3 = x (mod m)

So by one 2n x n multiplication, we reduce input size from 4n to 3n.
There remains one 3n / 2n division, which can be done using one n x n
multiply to get the quotient, and an 2n x n to get the remainder.

In total, we have two 2n x n and one n x n. This can be compared to
Barrett that uses two 2n x 2n, or block wise Barrett, which uses two
2n x n and two n x n, or BZ, which uses two 2n / n and one n x n.

Questions:

1. Does this make sense? It's clearly a win over Barrett, at least if
multiplies are done by schoolbook, and it's a win over BZ, assuming
that 2n/n is slower than n x n. It's less clear what the numbers
should be above schoolbook range, in particular since unbalanced
products 2n x n are of central importance.

2. Can it be adapted to 2-adic division (bdiv)? What's unclear to me
is what the precomputed value m3 corresponds to in a 2-adic
setting.

For a start, is the p-adic remainder operation even linear (for a
fixed modulo m)?

Regardss,
/Niels
```