nisse at lysator.liu.se
Tue Sep 17 17:27:53 UTC 2019
tg at gmplib.org (Torbjörn Granlund) writes:
> nisse at lysator.liu.se (Niels Möller) writes:
> I've made a quick try deleting it from the single-limb loop. See patch
> below. Measurements are a bit noisy, but it looks like a slowdown when I
> time it. With hgcd2 time increasing from 1220 cycles to 1290 (this time
> measured on broadwell), which seems to be an increase of more than one
> cycle per iteration of this loop.
> With which HGCD_DIV1_METHOD did you make these experiments?
> I find it hard to accept that 25 cycles per iteration is as good as it
> gets. (25 cycles is Intel's best division instruction speed.) I still
> believe we could beat it soundly with a table-based approach if it only
> rarely incurs a branch miss.
And one iteration is roughly 1.7 bits of average progress, right (same
as plain Euclid)? hgcd2 eliminates about 126 bits before exiting, which
means 74 iterations (half with doulble pracision, half single
precision). Best performance I've seen so far is around 1220 cycles,
which is 16 cycles per iteration, or 9.7 cycles per bit.
Would be interesting with a left-to-right binary variant. One iteration
would be a count_leading_zeros, some shifting and decision-making logic,
additions, no multiples.
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel