# precomputed modular reductions in GMP?

David Harvey dmharvey at math.harvard.edu
Sun Aug 27 00:25:26 CEST 2006

```Dear GMP developers,

I'm working on a fairly new computer algebra package called SAGE that
uses GMP for the underlying arithmetic.

http://modular.math.washington.edu/sage/

We have a class that represents the integers mod N. Whenever two
elements of this ring are multiplied, the result must be reduced mod
N. Currently this is implemented using mpz_fdiv_q().

However it occurred to me that if we are reducing modulo N many
times, perhaps we should do some precomputation to speed up all the
subsequent modular reductions. I wrote some code to try this out (see
below): it basically precomputes an approximate inverse for N, and
then does two multiplications and a few bitshifts/subtractions to
extract the remainder.

For very large input (say over 150,000 bits), this speeds up the
reductions by a factor of about two. The cross-over point between
mpz_fdiv_q() and this algorithm is somewhere around the 10,000 -
40,000 bit range on my machine. For smaller inputs my algorithm is
maybe twice as slow as just using mpz_fdiv_q() every time.

This cross-over point is quite disappointing. I had been hoping to
find something more around the 1,000 bit mark.

I haven't been able to push the cross-over point down at all from
this, even by diving into the mpn* functions. The difficulty seems to
be the following. The algorithm would be most efficient if it could
extract the high N bits of an NxN multiplication, and the low N bits
of an NxN multiplication. When N is very large, it's just as fast to
compute all the digits as it is to compute half the digits. But for N
small, when GMP is using basecase multiplication, it should be about
twice as fast to compute half the digits.

I'm wondering if there is any way to do these "extract-half-the-
answer" multiplications in GMP? Or are there any plans to implement it?

Or even better:

Are there any plans to add to GMP an interface for doing modular
arithmetic with respect to a given base, where some kind of
precomputation is performed to speed up the reductions, like I
suggest above? If there aren't any such plans, pretty pretty please
add this to the to-do list .... :-)

Thanks

/* Given A with N bits (i.e. 2^(N-1) <= A < 2^N), computes Z such
that 2^(2N+1) = AZ + B, where 0 <= B < A. So Z is an approximate
inverse for A. */
void prepare_preconditioned_reduction(mpz_t Z, mpz_t A, unsigned long N)
{
mpz_set_ui(Z, 1);
mpz_mul_2exp(Z, Z, 2*N + 1);   /* Z = 2^(2N+1) */
mpz_fdiv_q(Z, Z, A);
}

/* Computes X mod A, using some precomputed information.

Assumes:
* 2^(N-1) <= A < 2^N
* X is in the range 0 <= X < 2^(2N). For example X could be the product
of two integers each bounded by A.
* Z is the value computed by prepare_preconditioned_reduction.
*/
void preconditioned_reduction(mpz_t output, mpz_t X, mpz_t A, mpz_t
Z, unsigned long N)
{
/*  output = X >> (N-2)  */
mpz_fdiv_q_2exp(output, X, N-2);

/*  output = output * Z  */
mpz_mul(output, output, Z);

/*  output = output >> (N+3)  */
mpz_fdiv_q_2exp(output, output, N+3);

/* output now holds the estimated quotient X / A. It could
be too small by at most 1. */

/*  output = output * A  */
mpz_mul(output, output, A);

/*  output = X - output  */
mpz_sub(output, X, output);

/* subtract one last multiple of A if necessary */
if (mpz_cmp(output, A) >= 0)
mpz_sub(output, output, A);
}

David Harvey

```