Divide and conquer binomial

Thu Dec 15 06:07:41 CET 2011

Ciao!

Il Mer, 14 Dicembre 2011 10:42 pm, Niels Möller ha scritto:
> 3. When k and n are of moderate size, so that it is practical to iterate
>    over all primes <= n, count multiplicity of each prime in dividend
>    and divisor. I think Marco has implemented this.

I got a hint by Peter Luschny to this list [
http://gmplib.org/list-archives/gmp-discuss/2010-February/004036.html ],
and I implemented Goetgheluck's algorithm, directly computing the
multiplicity of each prime for the result.

> 4. When k is of moderate size, and n is too large for iterating over the
>    primes, but n and k still small enough so that bin(n, k) fits in
>    available storage, it would make sense to use some divide and conquer
>    algorithm. For example, bin(2^80, 2^20), which is approximately

>    One possibly useful identity (assume for simplicity that k is
>    even) is:
>
>      bin(n, k) = bin(n, k/2) bin(n-k/2, k/2) / bin(k, k/2)

One possibly useful observation is that both bin(., k/2) may use the value
of (k/2)!
Another observation is that the current asymptotically fast factorial uses
the algorithm from "Divide, Swing, and Conquer the Factorial!" by Peter
Luschny: basically we compute k!=(k/2)!^2*bin(k,k/2) , with a fast
prime-sieve-based implementation for bin(k,k/2).

>    Here, we halve the size of k, getting two recursive calls with

It should probably not be implemented with recursion, because there are
many computations we should recycle. Let's look at the second level:

bin(n,k)=(bin(n    , k/4) bin(n-k/4    , k/4) / bin(k/2,k/4)) *
         (bin(n-k/2, k/4) bin(n-k/2-k/4, k/4) / bin(k/2,k/4)) / bin(k,k/2) .

We divide two times by bin(k/2,k/4) (can computing the binverse once save
time?), and four times by (k/4)!
Moreover bin(k,k/2) needs to perform prime sieving up to k, half of this
list of sieved primes is needed also by bin(k/2,k/4), a quarter by (k/4)!

>    I think Marco has tried this or something similar.

Yes, there is a _very_naive_ implementation of this in the code I tested
last year ( http://bodrato.it/software/combinatorics.html#bin_uiui ).
My implementation works only if n fits in one limb, but k can be odd:

void
mpz_dc1_bin_uiui (mpz_ptr r, unsigned long n, unsigned long k)
{
  mpz_t num1, num2, den;
  mp_limb_t count, pow2;

  popc_limb (count, n);

  mpz_prodd (r, n - (k>>1) + 1, n); /* n(n-1)...(n-k/2+1), the odd factor */
  n -= k >> 1;

  mpz_init(num2);
  mpz_prodd (num2, n - ((k+1)>>1) + 1, n);
  n -= (k+1) >> 1;  /* n-k */

  mpz_init(den);
  mpz_bc_oddfac_1 (den, k>>1); /* (k/2)!, the odd factor */

  mpz_init(num1);
  mpz_divexact (num1, r, den); /* bin(n, k/2) */
  mpz_divexact (r, num2, den); /* bin(n-k/2, k/2) */

  mpz_mul (num2, r, num1); /* bin(n, k/2) * bin(n-k/2, k/2) */
  mpz_old_oddswing_1 (den, k); /* bin (k,k/2), the odd factor */

  mpz_divexact (r, num2, den);

  mpz_clear(num1);
  mpz_clear(num2);
  mpz_clear(den);

  popc_limb (pow2, k);
  pow2 -= count;
  popc_limb (count, n);
  pow2 += count;
  mpz_mul_2exp (r, r, pow2); /* take care of the factor 2 */
}

As you can see, there is no clever optimisation in it, nevertheless it had
a region where it was the fastest implementation of mine. Torbjorn's fine
optimisation of the basecase gets much faster... but it may be interesting
to apply his ideas to the building blocks here (mpz_prodd, and
mpz_bc_oddfac_1) and compare again.

mpz_bc_oddfac_1 already is in the repository, in mpz/fac_ui.c currently.

Best regards,
m

-- 
http://bodrato.it/papers/