precomputed modular reductions in GMP?
David Harvey
dmharvey at math.harvard.edu
Sun Aug 27 00:25:26 CEST 2006
Dear GMP developers,
I'm working on a fairly new computer algebra package called SAGE that
uses GMP for the underlying arithmetic.
http://modular.math.washington.edu/sage/
We have a class that represents the integers mod N. Whenever two
elements of this ring are multiplied, the result must be reduced mod
N. Currently this is implemented using mpz_fdiv_q().
However it occurred to me that if we are reducing modulo N many
times, perhaps we should do some precomputation to speed up all the
subsequent modular reductions. I wrote some code to try this out (see
below): it basically precomputes an approximate inverse for N, and
then does two multiplications and a few bitshifts/subtractions to
extract the remainder.
For very large input (say over 150,000 bits), this speeds up the
reductions by a factor of about two. The cross-over point between
mpz_fdiv_q() and this algorithm is somewhere around the 10,000 -
40,000 bit range on my machine. For smaller inputs my algorithm is
maybe twice as slow as just using mpz_fdiv_q() every time.
This cross-over point is quite disappointing. I had been hoping to
find something more around the 1,000 bit mark.
I haven't been able to push the cross-over point down at all from
this, even by diving into the mpn* functions. The difficulty seems to
be the following. The algorithm would be most efficient if it could
extract the high N bits of an NxN multiplication, and the low N bits
of an NxN multiplication. When N is very large, it's just as fast to
compute all the digits as it is to compute half the digits. But for N
small, when GMP is using basecase multiplication, it should be about
twice as fast to compute half the digits.
I'm wondering if there is any way to do these "extract-half-the-
answer" multiplications in GMP? Or are there any plans to implement it?
Or even better:
Are there any plans to add to GMP an interface for doing modular
arithmetic with respect to a given base, where some kind of
precomputation is performed to speed up the reductions, like I
suggest above? If there aren't any such plans, pretty pretty please
add this to the to-do list .... :-)
Thanks
/* Given A with N bits (i.e. 2^(N-1) <= A < 2^N), computes Z such
that 2^(2N+1) = AZ + B, where 0 <= B < A. So Z is an approximate
inverse for A. */
void prepare_preconditioned_reduction(mpz_t Z, mpz_t A, unsigned long N)
{
mpz_set_ui(Z, 1);
mpz_mul_2exp(Z, Z, 2*N + 1); /* Z = 2^(2N+1) */
mpz_fdiv_q(Z, Z, A);
}
/* Computes X mod A, using some precomputed information.
Assumes:
* 2^(N-1) <= A < 2^N
* X is in the range 0 <= X < 2^(2N). For example X could be the product
of two integers each bounded by A.
* Z is the value computed by prepare_preconditioned_reduction.
*/
void preconditioned_reduction(mpz_t output, mpz_t X, mpz_t A, mpz_t
Z, unsigned long N)
{
/* output = X >> (N-2) */
mpz_fdiv_q_2exp(output, X, N-2);
/* output = output * Z */
mpz_mul(output, output, Z);
/* output = output >> (N+3) */
mpz_fdiv_q_2exp(output, output, N+3);
/* output now holds the estimated quotient X / A. It could
be too small by at most 1. */
/* output = output * A */
mpz_mul(output, output, A);
/* output = X - output */
mpz_sub(output, X, output);
/* subtract one last multiple of A if necessary */
if (mpz_cmp(output, A) >= 0)
mpz_sub(output, output, A);
}
David Harvey
More information about the gmp-discuss
mailing list