mpn_perfpow: do we need a special GCD for mp_bitcnt_t?
Torbjörn Granlund
tg at gmplib.org
Thu Sep 4 15:31:05 UTC 2014
nisse at lysator.liu.se (Niels Möller) writes:
I've had a look at x86_64/gcd_1.asm, to try to extract an gcd_11.asm
from it.
Before entering the main binary gcd loop, the code needs to:
1. Remove and keep track of common trailing zeros,
2. Remove trailing zeros on one of the arguments (that might be cheaper
than checking which argument is odd already and swap them, not sure).
But the gcd_1.asm code also has a call to mpn_modexact_1_odd in the case
that u0 >> v0. I'm not sure this makes sense. If something like this is
kept at all, I'd suggest just getting an 8-bit v0-inverse from
binvert_limb_table, and rather than computing a larger inverse, just
loop reducing 8 bits at a time as long as u0 > v0. But one get some
overhead. Before the main loop one needs the following steps:
1. Remove and keep track of common trailing zeros,
2. Remove trailing zeros on *both* arguments (one will be odd, but we
don't yet know which).
3. Swap u0, v0 if u0 < v0
4. Check if u0 is much larger than v0, if not, enter main loop directly.
Do you think that's worth the effort?
This is trickier than I had realised, and needs some thinking to avoid
performance regressions.
It might make sense to have an entry point for log2(a) \approx log2(b)
allowing even operands, suitable for fresh-out-of multilimb mod
reduction. It might perhaps make sense to have an entry point for
operands which are also both odd. And we need an entry point allowing
any two limbs (perhaps requiring non-zero arguments).
Do you agree? Did I forget some scenario?
Many entry points are simple to handle in the asm code. In C, we just
need multiple separate functions.
Torbjörn
Please encrypt, key id 0xC8601622
More information about the gmp-devel
mailing list