Alternative div_qr_1
Torbjorn Granlund
tg at gmplib.org
Tue May 11 11:39:20 CEST 2010
nisse at lysator.liu.se (Niels Möller) writes:
But it's a two-limb number that should be added, and the carries from
both limbs are needed. That rules out the use of LEA as far as I see.
Oops.
And the condition value can be generated as a mask with no extra effort
(sbb mask, mask; ...; sbc $0, mask, then the condition is true iff mask
is all ones and iff the Z flag is clear). I don't know if it makes much
difference to do
... get condition as a mask in t1...
mov c0, t0
and t1, t0
and c1, t1
or
xor t0, t0
xor t1, t1
... get condition, in Z flag ...
cmovne c0, t0
cmovne c1, t1
Which one gives the shortest recurrency path? They might be fairly
equivalent.
You actually wrote an x86_64 assembly mod_1_1 some time ago that you
claimed ran in 6 c/l. I don't know what happened to that code. So the
new code needs to run at 5 c/l...
But let's not get too x86 centric. The new mod_1 function should be a
major improvement for badly multiplication challenged machines such as
SPARCv9. But it will surely be an improvment for machines less
challenged, but where umul_ppmm type multiplication is more expesive
that a few plainer operations.
PS. I really like the new algorithms, except now we have to figure out
names for more functions. :-)
--
Torbjörn
More information about the gmp-devel
mailing list