gcd_22

Torbjörn Granlund tg at gmplib.org
Fri Aug 16 19:50:49 UTC 2019


nisse at lysator.liu.se (Niels Möller) writes:

  $ ./tune/speed -p 100000 -s 1-64 -t 3 -C mpn_gcd_11 mpn_gcd_22
  overhead 4.01 cycles, precision 100000 units of 5.08e-10 secs, CPU freq 1966.75 MHz
             mpn_gcd_11    mpn_gcd_22
  1             #7.0459       11.0729
  4             #2.8721        6.6946
  7             #2.8415        7.3658
  10            #3.3039        7.4346
  13            #3.6084        7.8537
  16            #4.0952        7.8193
  19            #4.0620        8.0769
  22            #3.9624        7.9849
  25            #3.9169        8.0956
  28            #4.0359        8.2084
  31            #3.9816        8.0920
  34            #3.9748        8.0933
  37            #4.0041        8.0792
  40            #3.9327        8.0656
  43            #3.9239        8.1070
  46            #3.9064        8.0289
  49            #3.9143        8.0191
  52            #4.0642        9.5547
  55            #4.1251        9.8507
  58            #4.2523        7.9096
  61            #3.8440        7.8590
  64            #3.8325        7.8398

Is that right column for C code?  That's pretty good!

I now have two gcd_22 asm variants or x84-64, one based on negation with
a sbb-generated mask, and the other based on x86_64/core2/gcd_11.asm.

I reach performance of 1.5t and 2t where t is the gcd_11 time.

(Cycle counting is tricky now with chips overclocking themselves while
still counting cycles as per the base clock.)

-- 
Torbjörn
Please encrypt, key id 0xC8601622


More information about the gmp-devel mailing list