mpn_invert implementing ApproximateReciprocal.

Fri Dec 11 19:21:05 CET 2009

Hi,

> The size where the code starts winning is very nice!

It also depend on the basecase, if it will be optimised with new division
functions, as Niels suggests, the Newton threshold will grow.
Unfortunately, I'm not sure I fully understand the interface of divappr_q,
moreover the mpn_sbpi1_ flavour begins with ASSERT (dn > 2)... which means
we need a basecase for dn=1 or 2, than we can divappr+sub_1; finally use
Newton, isn't it?

> But I expected a bigger effect of bnm1.  It is possible to make

With 2000 limbs it give a 20% speedup, it's not bad.
There are two reasons why it is not relevant for smaller operands.

The theoretical one: mulmod has unbalanced product as a competitor:
mul(2n,n) costs 2M(n)+O(n),
With one splitting level, mulmod(2n) costs 2M(n)+O(n).(with a bigger O :-))
mulmod needs at least a second recursion to have the hope it win, but...

  mpn_mulmod_bnm1_next_size (mp_size_t n) {
...
    if (BELOW_THRESHOLD (n, 4 * (MULMOD_BNM1_THRESHOLD - 1) + 1))
      return (n + (2-1)) & (-2);

... we must wait a size above 4*MULMOD_BNM1_THRESHOLD to have more than
one level. Then the competitor will be unbalanced Toom-3, and the
threshold will grow even more. (Almost the same should apply to
FFT_MOD_BNP1.)

The other one, depends on implementation. I should rework the use of
memory to fit the needs of mulmod_bnm1. One of the difficult things is
that I'd need to estimate mpn_mulmod_bnm1_itch (mpn_mulmod_bnm1_next_size
(n + 1)), starting form n...

Regards,
Marco

-- 
http://bodrato.it/papers/