Alternative div_qr_1

Torbjorn Granlund tg at
Tue May 11 11:39:20 CEST 2010

nisse at (Niels Möller) writes:

  But it's a two-limb number that should be added, and the carries from
  both limbs are needed. That rules out the use of LEA as far as I see.

  And the condition value can be generated as a mask with no extra effort
  (sbb mask, mask; ...; sbc $0, mask, then the condition is true iff mask
  is all ones and iff the Z flag is clear). I don't know if it makes much
  difference to do
    ... get condition as a mask in t1...
    mov c0, t0
    and t1, t0
    and c1, t1
    xor t0, t0
    xor t1, t1
    ... get condition, in Z flag ...
    cmovne c0, t0
    cmovne c1, t1
Which one gives the shortest recurrency path?  They might be fairly

You actually wrote an x86_64 assembly mod_1_1 some time ago that you
claimed ran in 6 c/l.  I don't know what happened to that code.  So the
new code needs to run at 5 c/l...

But let's not get too x86 centric.  The new mod_1 function should be a
major improvement for badly multiplication challenged machines such as
SPARCv9.  But it will surely be an improvment for machines less
challenged, but where umul_ppmm type multiplication is more expesive
that a few plainer operations.

PS. I really like the new algorithms, except now we have to figure out
names for more functions.  :-)


More information about the gmp-devel mailing list