# Mod operation

Niels Möller nisse at lysator.liu.se
Mon Nov 30 15:17:36 CET 2009

```nisse at lysator.liu.se (Niels Möller) writes:

> Precompute m3 = m mod B^3, and m' = -m^{-1} mod (B-1). Step one is the
> same, compute
>
>   y = y0 + y1B + y2 B = x0 + x1 B + x2 B + x3 * m3
>
> Step two, compute
>
>   q = y * m' mod (B-1)
>
> Step three, compute
>
>   (y + q*m) / (B-1)
>
> where the division is exact and O(n). The total cost (ignoring O(n)
> operations) is then two 2nxn and one multiply mod B-1. This looks
> pretty good to me.

I've now tried implementing this. (Code attached. Probably doesn't
give correct result, but timing should be ok).

It seems to run slower (around 15%) than Torbjörn's current redc_n,
which uses one mullo and one mulmod_bnm1, both of size 2n in the above
notation.

There remains some possible improvements and variations of dc-style or
bipartite redc. One possible improvement of the above is to note that
the value of q*m is clearly known mod B-1, so then one could compute it
mod B (i.e., mullo) and use CRT. Then the work goes down to one 2n x
n, one multiplication mod B-1 and one mod B.

To guess whether or not it can be a saving to do half the work using
m3 and a 2n x n multiplication, it would be useful to know which is
fastest, a 2nxn multiplication or current redc_n (on size n).

I also wonder if a "pure" mod B-1 reduction could beat the current
redc_n. That algorthm would precompute m' = -m^{-1} mod (B-1), where B
is a power of two and B > m, assuming that m is invertible. Then to
reduce x, it would compute

q = x m' mod (B-1)

r = (x + q m) / (B-1)

Computing q is a mulmod_bnm1. To compute r, the division is exact and
cheap, and one doesn't have to compute the full product q m, it's
sufficient with the lower half. So this would be precesely the same
multiplication operations as current redc_n, but in opposite order.

The only obvious place for improvement is the mullo operation, which
is inefficient for large n. It would work just as well with mulhi. So
here's a question:

Would it be possible to compute

r = (x + q m) / (B-1)

using mulmid(q, m), rather than mullo or mulhi???

Regards,
/Niels

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: redc.c
URL: <http://gmplib.org/list-archives/gmp-devel/attachments/20091130/d9f4836c/attachment.c>
```