precomputed modular reductions in GMP?

Sun Aug 27 00:25:26 CEST 2006

Dear GMP developers,

I'm working on a fairly new computer algebra package called SAGE that  
uses GMP for the underlying arithmetic.

http://modular.math.washington.edu/sage/

We have a class that represents the integers mod N. Whenever two  
elements of this ring are multiplied, the result must be reduced mod  
N. Currently this is implemented using mpz_fdiv_q().

However it occurred to me that if we are reducing modulo N many  
times, perhaps we should do some precomputation to speed up all the  
subsequent modular reductions. I wrote some code to try this out (see  
below): it basically precomputes an approximate inverse for N, and  
then does two multiplications and a few bitshifts/subtractions to  
extract the remainder.

For very large input (say over 150,000 bits), this speeds up the  
reductions by a factor of about two. The cross-over point between  
mpz_fdiv_q() and this algorithm is somewhere around the 10,000 -  
40,000 bit range on my machine. For smaller inputs my algorithm is  
maybe twice as slow as just using mpz_fdiv_q() every time.

This cross-over point is quite disappointing. I had been hoping to  
find something more around the 1,000 bit mark.

I haven't been able to push the cross-over point down at all from  
this, even by diving into the mpn* functions. The difficulty seems to  
be the following. The algorithm would be most efficient if it could  
extract the high N bits of an NxN multiplication, and the low N bits  
of an NxN multiplication. When N is very large, it's just as fast to  
compute all the digits as it is to compute half the digits. But for N  
small, when GMP is using basecase multiplication, it should be about  
twice as fast to compute half the digits.

I'm wondering if there is any way to do these "extract-half-the- 
answer" multiplications in GMP? Or are there any plans to implement it?

Or even better:

Are there any plans to add to GMP an interface for doing modular  
arithmetic with respect to a given base, where some kind of  
precomputation is performed to speed up the reductions, like I  
suggest above? If there aren't any such plans, pretty pretty please  
add this to the to-do list .... :-)

Thanks

/* Given A with N bits (i.e. 2^(N-1) <= A < 2^N), computes Z such
that 2^(2N+1) = AZ + B, where 0 <= B < A. So Z is an approximate
inverse for A. */
void prepare_preconditioned_reduction(mpz_t Z, mpz_t A, unsigned long N)
{
     mpz_set_ui(Z, 1);
     mpz_mul_2exp(Z, Z, 2*N + 1);   /* Z = 2^(2N+1) */
     mpz_fdiv_q(Z, Z, A);
}

/* Computes X mod A, using some precomputed information.

Assumes:
* 2^(N-1) <= A < 2^N
* X is in the range 0 <= X < 2^(2N). For example X could be the product
   of two integers each bounded by A.
* Z is the value computed by prepare_preconditioned_reduction.
*/
void preconditioned_reduction(mpz_t output, mpz_t X, mpz_t A, mpz_t  
Z, unsigned long N)
{
     /*  output = X >> (N-2)  */
     mpz_fdiv_q_2exp(output, X, N-2);

     /*  output = output * Z  */
     mpz_mul(output, output, Z);

     /*  output = output >> (N+3)  */
     mpz_fdiv_q_2exp(output, output, N+3);

     /* output now holds the estimated quotient X / A. It could
     be too small by at most 1. */

     /*  output = output * A  */
     mpz_mul(output, output, A);

     /*  output = X - output  */
     mpz_sub(output, X, output);

     /* subtract one last multiple of A if necessary */
     if (mpz_cmp(output, A) >= 0)
         mpz_sub(output, output, A);
}

David Harvey