mpz_powm_ui(...) is MUCH slower than mpz_powm(...)

Mon Nov 5 18:31:32 CET 2012

[Thread moved from gmp-bugs to gmp-devel.]

Zimmermann Paul <Paul.Zimmermann at loria.fr> writes:

  I don't think the proposed algorithm is optimal. In the final loop, using
  the REDC representation for multiplying by b will need *full* modular
  multiplies (since b' will be in general of the same size as m).

You're right.  And I believe Niels pointed that out to me already.

  You should keep b small when it is, and thus replace r' <- REDC(r' * b', m)
  by r' <- (r' * b) mod m. When b has only one word, this will only perform
  one n * 1 -> n+1 multiplication, and one (n+1) / n division, which are cheap.

The difference between Hensel reduction and Euclidean reduction in GMP
is a linear term.  We should therefore use REDC residue representaton
for b when b > m/B^3 or something like that (B is the limb base).  That
is, we should compute and use b' iff b is wihin a few hundred bits of m.

(It is of course possible that b' << b, and REDC residue multiplication
is magically fast.  But that should not be a common scenario.)

-- 
Torbjörn