Better b^e mod m

Tue Oct 16 15:30:42 CEST 2012

nisse at lysator.liu.se (Niels Möller) writes:

  Torbjorn Granlund <tg at gmplib.org> writes:

  > (1) Tiny exponents such as 2, 3.

  If we're to do special cases for small e, maybe it might make sense to
  find optimal addition chains for small e. Not sure for which e this could
  make a significant difference, though.

I think that is a good idea, but we need to make sure "optimal"
considers a general multiply more expensive than a squaring, and perhaps
takes invariance into account.  To make this perfect, the size of other
operands will matter.  OK, our transform-multiply-operand functionality
is somewhat lacking at the moment, so perhaps I am suggesting more
generalisty than needed at the moment.

  I think we can accumulate and square in Montgomery representation,
  without necessarily converting b to Montgomery.

  Notation: Put x' = x B^n mod m, i.e., x' is the Montgomery
  representation of x.

  Do squarings the usual Montgomery way:

    (x^2)' = (x')^2 B^{-n} (mod m).

  But when we multiply by b, we can do that with a plain multiplication
  by b followed by an *Euclidean* reduction of a single limb

    (x b)' = x' b (mod m)

  Since b is much smaller than b', I think that's going to be a win over
  Montgomery, even for fairly small n.

  And when we tabulate powers of b, we should do the same with powers
  which fit in a single limb or just a few limbs.

Agreed!  (We should not just consider the case where b is a single limb,
but the more general case b << m.)

  (I've also been considering if we could also use single-limb hensel
  reduction, doing b multiplies as

    (x' b) B^{-1}

  Then we get a factor of B error (which is going to be amplified while
  squaring).

I am afraid I don't follow you.  Specifically, what does B^{-1} denote?
Do you compute this inverse modulo something, do you intend to form
either (x' b)/B or floor((x' b)/B)?

  Hmm, maybe it would make some sense with a powm function which takes a prime
  factorization of m as an additional argument?

It might make more sense to provide some mechanims for preparing CRT for
a set of primes (or coprimes) and then perform it:

  mpz_crt_precomp (precomp_variable, t1, t2, ..., NULL)
  mpz_crt (r, precomp_variable, v1, t1, v2, t2, ..., NULL)

The precomp function would compute various inverses and stuff it into
the opaque precomp_variable.

But CRT discussions might belong to a different thread.

-- 
Torbjörn