Small-base powm

Torbjorn Granlund tg at gmplib.org
Tue Mar 25 12:00:49 UTC 2014


I think what you suggest is very close to the pseudo code at
https://gmplib.org/devel/, under the header "mpz_powm and mpz_powm_ui".
But you suggest several additional refinements.

I wasn't considering of the case when the base is just a single limb,
but any time 2 * log(b) <= log(m).
  
These tricks are valuable only for limited exponents; as soon as the
exponent is large, the fraction of optimisable initil operatons will
become negligible.
  
  * One could find the largest k such that u^{2^k-1} fits in a single
    limb, and process k exponent bits at a time.
  
True.

  * One could handle powers of 2 specially, with shifts instead of
    multiplies.
  
Perhaps then with a separate mpn function.

  * Maybe one could extend this to take advantage of addmul_2, if present.
  
Not sure to follow you here.  You mean to use mul_2 for the multply by b
(or b^x for some x) and addmul_2 for some clever (non-principal)
reduction?

  * In general, for "small" U of size somewhere between a single limb, and
    full n limbs, there's got to be some threshold between using euclidean
    reduction of less than n limbs, to montgomery reduction of n limbs.
  
Makes sense.

  * Maybe one shouldn't do the n+1 to n reduction as a separate step, but
    somehow combine it with a montgomery reduction. E.g., one could
    probably write a pretty efficient function which computes u R B^{-1}
    (mod M) as an n-limb number. We then get some additional power of
    B^{-1} (mod M) into the result, but maybe they can be handled
    efficiently.
  
I need to think more about this...

  And if we do additional powm functions, with single limb modulo (and
  possibly large exponent, since reducing it requires the factorization of
  m), or single-limb exponent, or single-limb base, we get another
  challenge in naming all the functions. ;-)
  
We need to consult Dijstra on this.  :-)

I think our overhead for modexp ops with few-limb modulo is currently
terrible.  But it is not going to be easy to fix this, without resorting
to full assembly functions.


Torbjörn


More information about the gmp-devel mailing list