broot vs brootinv performancs
Torbjorn Granlund
tg at gmplib.org
Thu Nov 1 21:07:27 CET 2012
nisse at lysator.liu.se (Niels Möller) writes:
I've tried some more benchmarking (sorry, patches to speed not yet
checked in). To my disapppointment, broot appear to be 20% slower than
brootinv (except for small sizes, where it wins because it does the
initial few iterations without bignum overhead).
I suppose you then kindly disregard brootinv's need for a binvert call?
For perfpow's need (as I think you mentioned) the binvert time is
amortised over several brootinv calls. It would be neater to avoid use
broot instead of binvert+brootinv.
For rootexact, then surely broot will be faster. Or?
I don't understand why. When describing the iterations below, let n be
the target size of the iteraton.
brootinv uses the iteration
r' <-- ((k+1) r - r^{k+1} y) / k
For broot, I've rewritten the iteration as
r' <-- r - (a^{k-1} (r^2)^{(k+1)/2} - r) / k
which gave a modest improvement.
I suppose y and a are the input, i.e a = y and we are computing a^(1/k)
mod 2^t?
The mullo_n is the same. The sqr + powlo ought to be roughly the same as
the powlo i brootinv (consider small k so windowing is not useful). I
had expected a significant win because a mpn_neg of size n/2 ought to be
faster than the combination of mul_1, sub_n of size n and bdiv_q_1 of
size n/2.
I think (r^2)^{(k+1)/2} looks strange. Isn't r^{k+1} equivalent?
There's also a powlo (a^{k-1}) and mullo outside of the loop. I expected
them to contribute little to the total running time. But mullo seems to
take 13% for large sizes and mpn_powlo is not currently supported by
speed, so it's not so easy to measure).
I would suggest that you time things on several machine types. An AMD
Stars (eg shell) might be the best platform, since all asm code runs
best there.
Maybe I ought to power a and r at the same time as
a^{k-1} r^{k+1} = (a*r)^{k-1} * r^2
or so. Some wraparound or mulmid tricks may be applicable to
Why would that be faster?
Another note: I also wish for a "bdiv_nq_1" which returns - u / d (mod
B^n) rather than u / d (mod B^n), then one could eliminate the mpn_neg.
(And this is somewhat related to my bdiv work a few months ago, which
also returns a quotient with different sign).
Does mpn_neg take measurable time?
--
Torbjörn
More information about the gmp-devel
mailing list