non-subquadratic functions in gmp

Paul Zimmermann Paul.Zimmermann@loria.fr
Mon, 25 Aug 2003 10:24:52 +0200


       Hello,

which functions are still O(n^2) in gmp-4.1.2?
I know gcd (and gcdext). I remember divexact was O(n^2)
in a previous release. Is it still the case? 

For the gcd, I know two sub-quadratic algorithms:
- the "half-gcd" which has complexity O(M(n) log n)
  but with a high constant (involves 2x2 matrices).
  I tried once to implement it, but got a threshold
  of about 3000 limbs for *gcdext*, and the threshold
  would be much higher for the gcd.
- Sorenson's k-ary reduction which has complexity
  O(n^(1+1/(3-alpha))) for M(n) = O(n^alpha). This
  gives O(n^1.71) for Karatsuba, with a small constant.
  I have a running implementation at the mpz level,
  with a threshold of 7000 limbs wrt mpz_gcd, which
  could be reduced using the mpn level, and with a
  few improvements.

For divexact, I propose the following algorithm:
- add a new function mpz_invert_2exp that computes
  1/b mod 2^k for b odd. This is quite easy to do
  using Hensel lifting, with a cost of 3M(n).
  Below is a sample implementation in mpz.
  One can even add Karp-Markstein's trick to 
  incorporate the dividend at the last step,
  and compute a/b mod 2^k.
- then mpz_divexact can be simply computed by
  mpz_divide_2exp, with k = sizeinbase(a,2)+1-sizeinbase(b,2)

Does anybody on the list know better algorithms?
Are they other functions which are still quadratic?

Regards,
Paul Zimmermann

/* rop <- 1/op1 mod 2^t. Assumes op1 is odd. */
void
mpz_invert_2exp (mpz_t rop, mpz_t op1, unsigned long t)
{
  unsigned long k, s;
  mpz_t v, tmp;

  mpz_init (v);
  mpz_init (tmp);

  /* computes the number k of steps needed to lift to 2^t */
  for (k = 0, s = t; s > 1; k++, s = (s + 1) / 2);

  /* invariant: v*s = 1 mod 2^s, and s = ceil(2t/2^k) */
  mpz_set_ui (v, 1);
  while (k-- > 0)
    {
      s = 1 + ((2 * t - 1) >> k); /* ceil(2t/2^k) = 1 + floor((2t-1)/2^k) */
      /* we need only low s bits from b */
      mpz_mod_2exp (tmp, op1, s); /* tmp=op1 mod 2^s */
      mpz_mul (tmp, v, tmp); /* b * v */
      mpz_mod_2exp (tmp, tmp, s);
      mpz_ui_sub (tmp, 1, tmp);
      mpz_mul (tmp, v, tmp);
      mpz_mod_2exp (tmp, tmp, s);
      mpz_add (v, v, tmp);
    }

  mpz_set (rop, v);

  mpz_clear (v);
  mpz_clear (tmp);
}