precomputed modular reductions in GMP?

Mon Aug 28 17:35:05 CEST 2006

David Harvey <dmharvey at math.harvard.edu> writes:

  However it occurred to me that if we are reducing modulo N many  
  times, perhaps we should do some precomputation to speed up all the  
  subsequent modular reductions. I wrote some code to try this out (see  
  below): it basically precomputes an approximate inverse for N, and  
  then does two multiplications and a few bitshifts/subtractions to  
  extract the remainder.

That's one way of doing it.  But using Montgomery representation
(REDC) is usually better.

You should take a look at the the GMP development pages,
http://swox.com/gmp/development.html, where we have some code for
inversion (plain truncating and 2-adic).

  For very large input (say over 150,000 bits), this speeds up the  
  reductions by a factor of about two. The cross-over point between  
  mpz_fdiv_q() and this algorithm is somewhere around the 10,000 -  
  40,000 bit range on my machine. For smaller inputs my algorithm is  
  maybe twice as slow as just using mpz_fdiv_q() every time.

  This cross-over point is quite disappointing. I had been hoping to  
  find something more around the 1,000 bit mark.

With plain "truncating" inverse and Barrett's algorithm, that
might be difficult to achieve.

Even using REDC + Barrett will probably not help at 1000 bits.

  I haven't been able to push the cross-over point down at all from  
  this, even by diving into the mpn* functions. The difficulty seems to  
  be the following. The algorithm would be most efficient if it could  
  extract the high N bits of an NxN multiplication, and the low N bits  
  of an NxN multiplication. When N is very large, it's just as fast to  
  compute all the digits as it is to compute half the digits. But for N  
  small, when GMP is using basecase multiplication, it should be about  
  twice as fast to compute half the digits.

On the money.  You should look for references to "Mulder's
algorithm" for computing short products.

  I'm wondering if there is any way to do these "extract-half-the- 
  answer" multiplications in GMP? Or are there any plans to implement it?

In current GMP. there is an *internal* mpn function called
mpn_mullow_n.  The present code doesn't use Mulder's algorithm, as the
cutoff point needs more thought than we had time for.

There isn't any corresponding mpn_mulhigh_n.  Here, one probably wants
several functions, one returning the exact answer, and one which is
allowed to return a too low, but bounded, value.  mpn_mulhigh_n might
actually want to be mpn_mulhigh_mn, ie alowing operands of different
sizes.

GMP 5 should have all these functions, but the exact interface is not
known.  And mpn_mullow_n will probably be renamed.

You're welcome to contribute in this area, but please start with what
is already at the GMP development pages.

  Are there any plans to add to GMP an interface for doing modular  
  arithmetic with respect to a given base, where some kind of  
  precomputation is performed to speed up the reductions, like I  
  suggest above? If there aren't any such plans, pretty pretty please  
  add this to the to-do list .... :-)

Plans, but vague.

There will be functions at the mpn level with all needed support, but
not sure about any higher level stuff.

-- 
Torbjörn