gcd_22
Torbjörn Granlund
tg at gmplib.org
Fri Aug 16 19:50:49 UTC 2019
nisse at lysator.liu.se (Niels Möller) writes:
$ ./tune/speed -p 100000 -s 1-64 -t 3 -C mpn_gcd_11 mpn_gcd_22
overhead 4.01 cycles, precision 100000 units of 5.08e-10 secs, CPU freq 1966.75 MHz
mpn_gcd_11 mpn_gcd_22
1 #7.0459 11.0729
4 #2.8721 6.6946
7 #2.8415 7.3658
10 #3.3039 7.4346
13 #3.6084 7.8537
16 #4.0952 7.8193
19 #4.0620 8.0769
22 #3.9624 7.9849
25 #3.9169 8.0956
28 #4.0359 8.2084
31 #3.9816 8.0920
34 #3.9748 8.0933
37 #4.0041 8.0792
40 #3.9327 8.0656
43 #3.9239 8.1070
46 #3.9064 8.0289
49 #3.9143 8.0191
52 #4.0642 9.5547
55 #4.1251 9.8507
58 #4.2523 7.9096
61 #3.8440 7.8590
64 #3.8325 7.8398
Is that right column for C code? That's pretty good!
I now have two gcd_22 asm variants or x84-64, one based on negation with
a sbb-generated mask, and the other based on x86_64/core2/gcd_11.asm.
I reach performance of 1.5t and 2t where t is the gcd_11 time.
(Cycle counting is tricky now with chips overclocking themselves while
still counting cycles as per the base clock.)
--
Torbjörn
Please encrypt, key id 0xC8601622
More information about the gmp-devel
mailing list