broot vs brootinv performancs

Niels Möller nisse at lysator.liu.se
Tue Oct 30 15:26:30 CET 2012


I've tried some more benchmarking (sorry, patches to speed not yet
checked in). To my disapppointment, broot appear to be 20% slower than
brootinv (except for small sizes, where it wins because it does the
initial few iterations without bignum overhead).

I don't understand why. When describing the iterations below, let n be
the target size of the iteraton.

brootinv uses the iteration

  r' <-- ((k+1) r - r^{k+1} y) / k

which is

  mul_1,
  powlo,
  mullo_n,
  sub_n,
  bdiv_q_1,

all of size n. For broot, I've rewritten the iteration as

  r' <-- r - (a^{k-1} (r^2)^{(k+1)/2} - r) / k

which gave a modest improvement. We still have cancellation in the
subtraction in parenthesis (and I don't yet use wraparound, neither does
brootinv). The operations are then

  sqr       (input size n/2, output size n)
  powlo     (size n, exponent size one bit smaller)
  mullo_n   (size n)
  bdiv_q_1, (size n/2)
  mpn_neg   (size n/2)

The mullo_n is the same. The sqr + powlo ought to be roughly the same as
the powlo i brootinv (consider small k so windowing is not useful). I
had expected a significant win because a mpn_neg of size n/2 ought to be
faster than the combination of mul_1, sub_n of size n and bdiv_q_1 of
size n/2.

There's also a powlo (a^{k-1}) and mullo outside of the loop. I expected
them to contribute little to the total running time. But mullo seems to
take 13% for large sizes and mpn_powlo is not currently supported by
speed, so it's not so easy to measure).

Maybe I ought to power a and r at the same time as

  a^{k-1} r^{k+1} = (a*r)^{k-1} * r^2

or so. Some wraparound or mulmid tricks may be applicable to

Or maybe I should rather try to optimize brootinv to take advantage of
cancellation and wraparound.

Another note: I also wish for a "bdiv_nq_1" which returns - u / d (mod
B^n) rather than u / d (mod B^n), then one could eliminate the mpn_neg.
(And this is somewhat related to my bdiv work a few months ago, which
also returns a quotient with different sign).

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.



More information about the gmp-devel mailing list