gcdext_1 questions

Torbjorn Granlund tg at gmplib.org
Thu Dec 3 23:21:15 CET 2009


nisse at lysator.liu.se (Niels Möller) writes:

  #if GCDEXT_1_BINARY_METHOD == 1
    while (u != v)
      {
        unsigned count;
        
        if (u > v)
          {
            u -= v;
            count_trailing_zeros (count, u);
            u >>= count;
            t0 += t1; t1 <<= count;
            s0 += s1; s1 <<= count;
          }
        else
          {
            v -= u;
            count_trailing_zeros (count, v);
            v >>= count;
            t1 += t0; t0 <<= count;
            s1 += s0; s0 <<= count;
          }
        shift += count;
      }
  #else

This code will be speed limited by branch mispredicts (1/2 per
iteration, costing roughly 14 cycles each on current x86 machines) plus
the cost of count_trailing_zeros (7-9 cycles on AMD K7,K8,K9, 4 cycles
on AMD K10, 2-3 cycles on all Intel P6).

This really ought to be written in assembly, just like gcd_1 is.  Then
we can play with various things like cmov, or perhaps put t0*B+s0 in one
128-bit xmm register and t1*B+s1 in another.  The four last statements
in reach if-else arm would then become two.

-- 
Torbjörn


More information about the gmp-devel mailing list