# non-subquadratic functions in gmp

**Paul Zimmermann
**
Paul.Zimmermann@loria.fr

*Mon, 25 Aug 2003 10:24:52 +0200*

Hello,
which functions are still O(n^2) in gmp-4.1.2?
I know gcd (and gcdext). I remember divexact was O(n^2)
in a previous release. Is it still the case?
For the gcd, I know two sub-quadratic algorithms:
- the "half-gcd" which has complexity O(M(n) log n)
but with a high constant (involves 2x2 matrices).
I tried once to implement it, but got a threshold
of about 3000 limbs for *gcdext*, and the threshold
would be much higher for the gcd.
- Sorenson's k-ary reduction which has complexity
O(n^(1+1/(3-alpha))) for M(n) = O(n^alpha). This
gives O(n^1.71) for Karatsuba, with a small constant.
I have a running implementation at the mpz level,
with a threshold of 7000 limbs wrt mpz_gcd, which
could be reduced using the mpn level, and with a
few improvements.
For divexact, I propose the following algorithm:
- add a new function mpz_invert_2exp that computes
1/b mod 2^k for b odd. This is quite easy to do
using Hensel lifting, with a cost of 3M(n).
Below is a sample implementation in mpz.
One can even add Karp-Markstein's trick to
incorporate the dividend at the last step,
and compute a/b mod 2^k.
- then mpz_divexact can be simply computed by
mpz_divide_2exp, with k = sizeinbase(a,2)+1-sizeinbase(b,2)
Does anybody on the list know better algorithms?
Are they other functions which are still quadratic?
Regards,
Paul Zimmermann
/* rop <- 1/op1 mod 2^t. Assumes op1 is odd. */
void
mpz_invert_2exp (mpz_t rop, mpz_t op1, unsigned long t)
{
unsigned long k, s;
mpz_t v, tmp;
mpz_init (v);
mpz_init (tmp);
/* computes the number k of steps needed to lift to 2^t */
for (k = 0, s = t; s > 1; k++, s = (s + 1) / 2);
/* invariant: v*s = 1 mod 2^s, and s = ceil(2t/2^k) */
mpz_set_ui (v, 1);
while (k-- > 0)
{
s = 1 + ((2 * t - 1) >> k); /* ceil(2t/2^k) = 1 + floor((2t-1)/2^k) */
/* we need only low s bits from b */
mpz_mod_2exp (tmp, op1, s); /* tmp=op1 mod 2^s */
mpz_mul (tmp, v, tmp); /* b * v */
mpz_mod_2exp (tmp, tmp, s);
mpz_ui_sub (tmp, 1, tmp);
mpz_mul (tmp, v, tmp);
mpz_mod_2exp (tmp, tmp, s);
mpz_add (v, v, tmp);
}
mpz_set (rop, v);
mpz_clear (v);
mpz_clear (tmp);
}