# precomputed modular reductions in GMP?

Torbjorn Granlund tege at swox.com
Mon Aug 28 17:35:05 CEST 2006

```David Harvey <dmharvey at math.harvard.edu> writes:

However it occurred to me that if we are reducing modulo N many
times, perhaps we should do some precomputation to speed up all the
subsequent modular reductions. I wrote some code to try this out (see
below): it basically precomputes an approximate inverse for N, and
then does two multiplications and a few bitshifts/subtractions to
extract the remainder.

That's one way of doing it.  But using Montgomery representation
(REDC) is usually better.

You should take a look at the the GMP development pages,
http://swox.com/gmp/development.html, where we have some code for

For very large input (say over 150,000 bits), this speeds up the
reductions by a factor of about two. The cross-over point between
mpz_fdiv_q() and this algorithm is somewhere around the 10,000 -
40,000 bit range on my machine. For smaller inputs my algorithm is
maybe twice as slow as just using mpz_fdiv_q() every time.

This cross-over point is quite disappointing. I had been hoping to
find something more around the 1,000 bit mark.

With plain "truncating" inverse and Barrett's algorithm, that
might be difficult to achieve.

Even using REDC + Barrett will probably not help at 1000 bits.

I haven't been able to push the cross-over point down at all from
this, even by diving into the mpn* functions. The difficulty seems to
be the following. The algorithm would be most efficient if it could
extract the high N bits of an NxN multiplication, and the low N bits
of an NxN multiplication. When N is very large, it's just as fast to
compute all the digits as it is to compute half the digits. But for N
small, when GMP is using basecase multiplication, it should be about
twice as fast to compute half the digits.

On the money.  You should look for references to "Mulder's
algorithm" for computing short products.

I'm wondering if there is any way to do these "extract-half-the-
answer" multiplications in GMP? Or are there any plans to implement it?

In current GMP. there is an *internal* mpn function called
mpn_mullow_n.  The present code doesn't use Mulder's algorithm, as the
cutoff point needs more thought than we had time for.

There isn't any corresponding mpn_mulhigh_n.  Here, one probably wants
several functions, one returning the exact answer, and one which is
allowed to return a too low, but bounded, value.  mpn_mulhigh_n might
actually want to be mpn_mulhigh_mn, ie alowing operands of different
sizes.

GMP 5 should have all these functions, but the exact interface is not
known.  And mpn_mullow_n will probably be renamed.

is already at the GMP development pages.

Are there any plans to add to GMP an interface for doing modular
arithmetic with respect to a given base, where some kind of
precomputation is performed to speed up the reductions, like I
suggest above? If there aren't any such plans, pretty pretty please
add this to the to-do list .... :-)

Plans, but vague.

There will be functions at the mpn level with all needed support, but
not sure about any higher level stuff.

--
Torbjörn
```