gcdext_1 questions
Torbjorn Granlund
tg at gmplib.org
Thu Dec 3 23:21:15 CET 2009
nisse at lysator.liu.se (Niels Möller) writes:
#if GCDEXT_1_BINARY_METHOD == 1
while (u != v)
{
unsigned count;
if (u > v)
{
u -= v;
count_trailing_zeros (count, u);
u >>= count;
t0 += t1; t1 <<= count;
s0 += s1; s1 <<= count;
}
else
{
v -= u;
count_trailing_zeros (count, v);
v >>= count;
t1 += t0; t0 <<= count;
s1 += s0; s0 <<= count;
}
shift += count;
}
#else
This code will be speed limited by branch mispredicts (1/2 per
iteration, costing roughly 14 cycles each on current x86 machines) plus
the cost of count_trailing_zeros (7-9 cycles on AMD K7,K8,K9, 4 cycles
on AMD K10, 2-3 cycles on all Intel P6).
This really ought to be written in assembly, just like gcd_1 is. Then
we can play with various things like cmov, or perhaps put t0*B+s0 in one
128-bit xmm register and t1*B+s1 in another. The four last statements
in reach if-else arm would then become two.
--
Torbjörn
More information about the gmp-devel
mailing list