broot vs brootinv performancs
tg at gmplib.org
Tue Nov 13 18:50:15 CET 2012
nisse at lysator.liu.se (Niels Möller) writes:
$ ./speed -o cycles-broken -s 1-30,100,200 -r mpn_mullo_n mpn_binvert
it appears binvert is up to 5 times slower than mullo for just a few
limbs, 2 times slower for n = 10, and roughly 50% slower in the range
between 20 and 200 limbs. So replacing a binvert by a mulllo is not that
big an optimization.
Asymptotically, mullo is M(n) and (b)invert is 2 M(n) or something like
I thought it was around 1.5 M(n), but I am not sure.
I'll see if/when I get to further optimizations of the other functions.
I realise that I have quite limited understanding about when the "lift"
happens in these iterations. Perhaps we can compute modulo the previous
power-of-2 for more operations...
> I cannot recall where the redc/bdiv stuff arrived last time we worked on
> it. We made an effort at unification, making them behave exactly the
> same, perhaps eliminating redc.
In the repo ~hg/gmp-proj/gmp-bdiv, the bdiv functions are changed to the
new convention with different signs of q, and callers updated. The redc
uses are not touched yet, and bdiv_2 (to replace redc_2) is missing.
> I also don't recall if a bdiv style quotient can be "negated" at no
> extra cost, but I suppose it can.
For bdiv_1, I'm pretty sure there's no extra cost. And since those loops
are small, I think it makes some sense to provide both variants. Maybe
also for bdiv_2, if we do those.
For larger divisors, it's addmul_1 vs submul_1 (and addmul_2 vs missing
submul_2), and since negation is going to be relatively cheap, I *don't*
think we should have functions for both signs of q.
I cannot fully follow your reasoning, I'm afraid. It would be good to
clarify these things:
* Which function uses submul_1? I thought neither the old nor the new
bdiv functions used that.
* I assume the gmp-bdiv repo changed all bdiv functions to use the new
convention. Correct? Now you're suggesting that we reintroduce
positive-quotient functions for 1 (and 2) limb divisors? (Even if a
loop is simple, multiple variants come at a price. We might want to
have both in asm, for example.)
* There is no mpn_sbpi2_bdiv_qr or mpn_sbpi2_bdiv_r, analogous to redc_2
(except that we spec dividend and divisor size separately), but
creating one from redc_2 should be straightforward.
* Has there been any effort for replacing redc? I see
mpn/generic/powm.c still calls redc_1, redc_2, and redc_n.
More information about the gmp-devel