Mod operation

Wed Nov 25 15:03:12 CET 2009

       Niels,

> One curious detail: Assume we only precompute the inverse of m. Then
> to compute B^3 mod m, the quotient is free (it's the first half of the
> inverse), so this is a single 2n x n multiply. If we include the cost
> for this, we get a total of three 2n x n and one n x n, which in
> schoolbook range is 7 n^2, which still is slightly better than Barrett
> division which takes 8 n^2. But it's worse than blockwise Barret which takes
> 6n^2, and then it's somewhat silly to use Barrett division in the
> schoolbook range).

yes, but in the Karatsuba range it will cost 2M(n) [to compute B^3 mod m]
plus 2M(n) [to reduce the coeff of B^3 mod m], plus 3M(n) for the reduction
of the remainder mod m, thus 7M(n) = 2.333M(n), i.e., this will be more than
the recursive division in 2M(n) (which in addition requires no precomputation).
The more I think about it, the more I believe the "recursive division" is kind
of "magic".

> I also wonder if there's some clever way to compute the inverse and m3
> simultaneously (they're quotient and remainder of B^3 / m). The
> "obvious" way is newton iteration to get the quotient aka inverse, and
> anothere multiplication to get the remainder, but maybe they can be
> combined in the final newton step or something like that.

you can do this by calling the recursive division with dividend B^3 and
divisor m. This is exactly half the work done by the recursive division
for (4n)/(2n), i.e., (3n)/(2n), and will cost 3M(n) in the Karatsuba range.

> We can also conclude that there seems do be a place for div_r and
> bdiv_r, at least in the case that the the modulo is invariant and
> *maybe* in general for large enough numbers.

yes, I strongly support the [b]div_r functions.

Paul