Small operands gcd improvements
Torbjörn Granlund
tg at gmplib.org
Mon Aug 5 12:17:14 UTC 2019
nisse at lysator.liu.se (Niels Möller) writes:
> Checking u1 = 0 and v1 = 0 separately as you suggest is a different
> thing, and it might not have zero cost in the gcd_22 loop.
I think only the shifted number should be checked, and as the main loop
exit condition.
OK, so we agree there!
If both u1 > 0 and v1 > 0 on entry (if gcd_22 require that is
unclear), then u1 == 0 impliex v1 > 0.
I don't think gcd_22 should misbehave for v1 = u1 = 0. It currently is
designed to exit for such operands in order to leave things for gcd_11.
But wait, I think my current code might fail for u1 = 0, u0 = small, v1
= v0 = 0xfff.fff. That would result in the upper half difference being
0 with carry out. The zero will trigger loop exit currently. Fixable,
but we should make sure to test for that (at least if u1 = 0 is
allowed).
> We could do a large rightshift outside the loop and then jump back into
> and (ab)use gcd_22 with u1 = 0 xor v1 = 0. I suspect random operands
> will not see any timing difference. Don't you agree that u1 / v1 will
> not be too far from 1.0 on average?
That would probably work fine, but maybe not simpler than an explicit
gcd_22. So then the normal case would call gcd_22 with u1 > 0, v1 > 0,
and exit the loop with u1 > 0, v1 == 0. I don't see why we'd need a
right shift here, I think one could just check if u1 > 0 and if so jump
back to gcd_22, and next time the loop exits we will have u1 = 0, v1 =
0.
What do you mean by an "explicit gcd_22"?
What I am saying is that we should not design and validate more loops
that justified. While gcd_21 would be sleeker than gcd_22, it might not
speed up non-contrived examples as after gcd_22 exits because it has too
(the case v0 = 0 or u0 = 0) we have an unlikely scenario. If we let it
exit if v1 = 0 *or* u1 = 0, as opposed to v1 = 0 and u1 = 0, then we
might need gcd_21 in some form, but it will rarely run for more than 1
or two iterations (not enough to train its loop branch to be rightly
predicted).
I think an explicit gcd_21 loop may be clearer, but unlikely to matter
for performance.
I'd like to avoid another asm loop if possible.
--
Torbjörn
Please encrypt, key id 0xC8601622
More information about the gmp-devel
mailing list