broot vs brootinv performancs
nisse at lysator.liu.se
Tue Nov 13 15:50:29 CET 2012
Torbjorn Granlund <tg at gmplib.org> writes:
> nisse at lysator.liu.se (Niels Möller) writes:
> > For rootexact, then surely broot will be faster [than brootinv]. Or?
> I'm not sure.
> I haven't had time to spend enough attention on this, but I continue to
> be puzzled about the relative speed of these functions.
> * The cost of mpn_binvert should be forbidding in certain perfpow
> usages. (Perhaps that could be smoothed out by computing the needed
> mpn_binvert Hensel lifting on demand, in the loop.)
$ ./speed -o cycles-broken -s 1-30,100,200 -r mpn_mullo_n mpn_binvert
it appears binvert is up to 5 times slower than mullo for just a few
limbs, 2 times slower for n = 10, and roughly 50% slower in the range
between 20 and 200 limbs. So replacing a binvert by a mulllo is not that
big an optimization.
Asymptotically, mullo is M(n) and (b)invert is 2 M(n) or something like
> * The broot loop doesn't look more expensive than brootinv's.
No, the loops use about the same operations. But with broot_invm1, we
have an additional large powlo outside of the loop. My best guess is
that is the reason why it is slower.
> The point is that we're doubling the precision, input r is n/2 limbs and
> output is n limbs. So the powering in both cases is a powlo of n limbs.
> While r^2 is a squaring with n/2 input limbs and n output limbs. And a
> size n/2 sqr ought to be a bit faster than a size n sqrlo. And it's also
> somewhat cleaner to not read the uninitialized high part of r.
> Clever. Perhaps this could be one in more places?
I'll see if/when I get to further optimizations of the other functions.
> I cannot recall where the redc/bdiv stuff arrived last time we worked on
> it. We made an effort at unification, making them behave exactly the
> same, perhaps eliminating redc.
In the repo ~hg/gmp-proj/gmp-bdiv, the bdiv functions are changed to the
new convention with different signs of q, and callers updated. The redc
uses are not touched yet, and bdiv_2 (to replace redc_2) is missing.
> I also don't recall if a bdiv style quotient can be "negated" at no
> extra cost, but I suppose it can.
For bdiv_1, I'm pretty sure there's no extra cost. And since those loops
are small, I think it makes some sense to provide both variants. Maybe
also for bdiv_2, if we do those.
For larger divisors, it's addmul_1 vs submul_1 (and addmul_2 vs missing
submul_2), and since negation is going to be relatively cheap, I *don't*
think we should have functions for both signs of q.
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel