gcd_11 without abs

Sat Nov 21 14:59:35 UTC 2020

Torbjörn Granlund <tg at gmplib.org> writes:

> The way to improve the current code on almost any processor is by
> shortening the critical dependency path.  reducing the insn count helps
> very few CPUs these days.
>
> Do you think you can shorten the dependency path?

I'm afraid that's difficult.

It seems the critical path will unavoidable include the sequence 

  add/sub
  count trailing zeros
  right shift

And then there will be some conditional operations somewhere. For
shortest path, it's best if the code can be arranged to do those
operations in parallel with count trailing zeros.

Current ARM code (I'm looking at the v6t2 version) does that, with two
conditional instructions (rsbcc, movcs) independent of the ctz. But then
count trailing zeros is done with two depending instructions, rbit, clz,
so the critical path is then 4 instructions. Unclear to me why it runs
faster than 4 cycles on some processors, maybe the rbit+clz combination
is executed in a single step on some of the execution units?

Current x86_64 code (I'm looking at the core2 version) also looks pretty
tight, also with all conditional moves independent of the count trailing
zeros instruction. Dependency graph is more complex, but as I read it it
depth would be 4, with each of these groups of instructions executed in
parallel in one cycle:

        sub     u0, %rdx                C v - u

        bsf     %rdx, %rcx
        mov     u0, %rax
        sub     v0, u0                  C u - v    <----

        jnz     L(top)
L(top): cmovc   %rdx, u0                C u = |u - v|
        cmovc   %rax, v0                C v = min(u,v)

        shr     R8(%rcx), u0
L(odd): mov     v0, %rdx

I wonder if it's possible to get down to 3 cycles. Scheduling the second
sub instruction in parallell with the first would be a start.

Back to the no-abs version. That seems to require cmovs on the critical
path *after* the shift, and that would tend to make the critical path
longer.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.