broot vs brootinv performancs

Tue Nov 13 18:50:15 CET 2012

nisse at lysator.liu.se (Niels Möller) writes:

  Trying

  $ ./speed -o cycles-broken -s 1-30,100,200 -r mpn_mullo_n mpn_binvert

  it appears binvert is up to 5 times slower than mullo for just a few
  limbs, 2 times slower for n = 10, and roughly 50% slower in the range
  between 20 and 200 limbs. So replacing a binvert by a mulllo is not that
  big an optimization.

  Asymptotically, mullo is M(n) and (b)invert is 2 M(n) or something like
  that, right?

I thought it was around 1.5 M(n), but I am not sure.

  I'll see if/when I get to further optimizations of the other functions.

OK.

I realise that I have quite limited understanding about when the "lift"
happens in these iterations.  Perhaps we can compute modulo the previous
power-of-2 for more operations...

  > I cannot recall where the redc/bdiv stuff arrived last time we worked on
  > it.  We made an effort at unification, making them behave exactly the
  > same, perhaps eliminating redc.

  In the repo ~hg/gmp-proj/gmp-bdiv, the bdiv functions are changed to the
  new convention with different signs of q, and callers updated. The redc
  uses are not touched yet, and bdiv_2 (to replace redc_2) is missing.
  IIRC.

  > I also don't recall if a bdiv style quotient can be "negated" at no
  > extra cost, but I suppose it can.

  For bdiv_1, I'm pretty sure there's no extra cost. And since those loops
  are small, I think it makes some sense to provide both variants. Maybe
  also for bdiv_2, if we do those.

  For larger divisors, it's addmul_1 vs submul_1 (and addmul_2 vs missing
  submul_2), and since negation is going to be relatively cheap, I *don't*
  think we should have functions for both signs of q.

I cannot fully follow your reasoning, I'm afraid.  It would be good to
clarify these things:

* Which function uses submul_1?  I thought neither the old nor the new
  bdiv functions used that.

* I assume the gmp-bdiv repo changed all bdiv functions to use the new
  convention.  Correct?  Now you're suggesting that we reintroduce
  positive-quotient functions for 1 (and 2) limb divisors?  (Even if a
  loop is simple, multiple variants come at a price.  We might want to
  have both in asm, for example.)

* There is no mpn_sbpi2_bdiv_qr or mpn_sbpi2_bdiv_r, analogous to redc_2
  (except that we spec dividend and divisor size separately), but
  creating one from redc_2 should be straightforward.

* Has there been any effort for replacing redc?  I see
  mpn/generic/powm.c still calls redc_1, redc_2, and redc_n.

-- 
Torbjörn