# Fast mpz_bin_ui

Niels Möller nisse at lysator.liu.se
Thu Oct 12 20:03:15 UTC 2017

Let me write some more, after pondering the problem for a few days.

Regular powers n^k (mod m) can be efficiently computed based on repeated
squaring and the basic rule that

n^{2k} = (n^2)^k = (n^2)^k

(and without mod m, one would use the same algorithm, it's just that the
size of the result grows linearly with k).

First, let's note that there's unlikely to be any comparatively
efficient way to compute falling factorial powers, for it there were,
there would be an efficient way to compute n! (mod m), and that would be
a fast factoring algorithm: if n is the product of two unknown primes,
then the smallest factor is

p = gcd (n, floor(sqrt n)! mod n )

The closest analog of a squaring rule for falling factorial powers I've
been able to find is

n^_{2k} = n^{_k} (n-k)^{_k}

where n^{_k} denotes the falling factorial n (n-1) ... (n-k+1). And the
right hand side doesn't look much like a square, I'd wish we could
rewrite it to a formula like

n^_{2k} = ((n-k/2)^{_k})^{_2} + some small number

but I'm afraid we can't.

So, back to binomial coefficients,

n \over k = n^{_k} / k!

and the case where n is larger than one limb, k a single limb, and k <=
2n. I guess k has to be reasonably small, for the result to fit in
available memory (but I don't have any numbers handy). Then k! is going
to be much smaller than n^{_k}, so that computing k! and doing the exact
division should be a very small fraction of the total work.

So for k!, I'd suggest just using mpz_oddfac_1 (and it would make sense
to me to change it to return the exponent for the omitted power of two,
which from the implementation of mpz_fac_ui seems to be k -
popcount(k)).

For n^_{k}, I would suggest a function to compute the odd part and
return the exponent of the power of two, which would repeatedly split in
even and odd numbers and compute products of the form

n (n-2) (n-4) ... for odd n.

Each of these products could be computed by grouping four terms at a
time and use

>   n (n-2)(n-4)(n-6) = [(n-3)^2 - 5]^2 - 25

I like this trick, replacing three multiplies by two squares (but it's
unclear how much it will save compared to the total running time). For
the reasons given above, I wouldn't expect the trick to generalize much
further.

Then multiply all the resulting odd factors together in some roughly
balanced tree. And then we could do something like

void
mpz_bin_ui (mpz_ptr r, mpz_srcptr n, unsigned long int k)
{
mpz_t k_fac;
mp_bitcnt_t k_exp, n_exp;

mpz_init (k_fac);
k_exp = mpz_oddfac_1 (k_fac, k);
n_exp = mpz_odd_falling_fac (r, n, k);
ASSERT (k_exp <= n_exp);
mpz_divexact (r, r, k_fac);
mpz_mul_2exp (r, r, n_exp - k_exp);

mpz_clear (k);
}

Would you like to give it a try?

One could try some sieving and instead produce n^_{k} as a product of
prime powers for all primes <= k and a residual, to be able to easily
subtract the corresonding prime-power representation of k!.

It would be interesting with some analysis of the size of n \over k
after dividing out all prime factors smaller than k. If the prime powers
of primes <= k makes up a large part, sieving could pay off, since that
part can be computed efficiently.

Will n \over k be more or less "smooth" than a random number of similar
size? I'd guess slightly more smooth, since it's going to be huge
compared to n, but still can't have any prime factors larger than n. But
we can still have lots of prime factors in the range k < p <= n.

Regards,
/Niels

--
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.