Torbjörn Granlund tg at gmplib.org
Sun Sep 15 13:38:05 UTC 2019

nisse at lysator.liu.se (Niels Möller) writes:

  tg at gmplib.org (Torbjörn Granlund) writes:

  > Looking at when method 1 or 3 is faster than 2 is more interesting.
  > Method 1 and to some extent also method 3 would benefit from asm code,

  How would method 1 benefit from asm code? To use instructions providing
  both quotient and remainder? For x86_64, it looks like gcc uses

    xor %edx, %edx
    div %rcx

  to get quotient and remainder (i.e., using the 128/64 division
  instruction; there seems there is no 64/64 divison instruction?).

I meant to write method 2 and 3.

  Method 2 and 3 begs for using subtract + cmov, not sure if the compiler
  is clever enough to do that for the code using masks.

The compiler is not that clever, it seems.

  > Would your present tuneup/speed setup allow measuring of asm code?

  I think so, ./speed mpn_hgcd2 will measure the mpn_hgcd2 function in the
  library, with no extra hacks. Will maybe need some tricks to be able to
  select other methods and ignore the asm version, but I'd expect that to be
  fairly easy.

Well, one will at least need to suppress the local versions by editing
hgcd2.c.  No?

  > The current div1 measurements include hgcd2's own time, right?  I.e., if
  > we found a div1 which runs in zero cycles, the timings would not be
  > zero.

  Yes. Do you think it makes sense to measure div1 in isolation? Since
  performance depends so much on the distribution of the inputs, measuring
  hgcd2 makes sense to me.

I think I agree.

The distribution of operands are the same as for a plain, old Euclid,
isn't it?  So one could perhaps provide a set of uniformly distribted
random numbers for div1 and get adequate results?

  I'm appending another iteration of the patch to add div2 function based
  on div1 on the high limbs. Selected via HGCD2_DIV2_METHOD. Benchmarks:

  HGCD2_DIV2_METHOD mpn_hgcd2_1   mpn_hgcd2_2   mpn_hgcd2_3
                  1    #1504.47       1740.53       1513.56  div1 + unlikely case
                  2    #1739.66       1858.01       1753.73  current code
                  3    #1615.32       1736.69       1621.95  the if:ed out version

Looks good.  Ready for repo?

What is the current reasoning on suppressing the q = 1 code in the hgcd2
code calling div1 and div2?  That's largely orthogomal to the current
HGCD2_DIV*_METHOD choices, I suppose, although surely q = 1 special
handling makes more sense for HGCD2_DIV1_METHOD = 1 that for the other

A combinatoric explosion would be nice to avoid.

Please encrypt, key id 0xC8601622

More information about the gmp-devel mailing list