# broot vs brootinv performancs

Torbjorn Granlund tg at gmplib.org
Thu Nov 1 21:07:27 CET 2012

```nisse at lysator.liu.se (Niels Möller) writes:

I've tried some more benchmarking (sorry, patches to speed not yet
checked in). To my disapppointment, broot appear to be 20% slower than
brootinv (except for small sizes, where it wins because it does the
initial few iterations without bignum overhead).

I suppose you then kindly disregard brootinv's need for a binvert call?

For perfpow's need (as I think you mentioned) the binvert time is
amortised over several brootinv calls.  It would be neater to avoid use

For rootexact, then surely broot will be faster.  Or?

I don't understand why. When describing the iterations below, let n be
the target size of the iteraton.

brootinv uses the iteration

r' <-- ((k+1) r - r^{k+1} y) / k

For broot, I've rewritten the iteration as

r' <-- r - (a^{k-1} (r^2)^{(k+1)/2} - r) / k

which gave a modest improvement.

I suppose y and a are the input, i.e a = y and we are computing a^(1/k)
mod 2^t?

The mullo_n is the same. The sqr + powlo ought to be roughly the same as
the powlo i brootinv (consider small k so windowing is not useful). I
had expected a significant win because a mpn_neg of size n/2 ought to be
faster than the combination of mul_1, sub_n of size n and bdiv_q_1 of
size n/2.

I think (r^2)^{(k+1)/2} looks strange.  Isn't r^{k+1} equivalent?

There's also a powlo (a^{k-1}) and mullo outside of the loop. I expected
them to contribute little to the total running time. But mullo seems to
take 13% for large sizes and mpn_powlo is not currently supported by
speed, so it's not so easy to measure).

I would suggest that you time things on several machine types.  An AMD
Stars (eg shell) might be the best platform, since all asm code runs
best there.

Maybe I ought to power a and r at the same time as

a^{k-1} r^{k+1} = (a*r)^{k-1} * r^2

or so. Some wraparound or mulmid tricks may be applicable to

Why would that be faster?

Another note: I also wish for a "bdiv_nq_1" which returns - u / d (mod
B^n) rather than u / d (mod B^n), then one could eliminate the mpn_neg.
(And this is somewhat related to my bdiv work a few months ago, which
also returns a quotient with different sign).

Does mpn_neg take measurable time?

--
Torbjörn
```