gcd_22

Niels Möller nisse at lysator.liu.se
Fri Aug 16 06:56:06 UTC 2019


nisse at lysator.liu.se (Niels Möller) writes:

> So the gcd_22 code with all the branches needs about 11 cycles per input
> bit. gcd_11 is coreihwl/gcd_11.asm in this build.

I had to do a quick try with the masking version before leaving for work:

$ ./tune/speed -p 100000 -s 1-64 -t 3 -C mpn_gcd_11 mpn_gcd_22
overhead 4.01 cycles, precision 100000 units of 5.08e-10 secs, CPU freq 1966.75 MHz
           mpn_gcd_11    mpn_gcd_22
1             #7.0459       11.0729
4             #2.8721        6.6946
7             #2.8415        7.3658
10            #3.3039        7.4346
13            #3.6084        7.8537
16            #4.0952        7.8193
19            #4.0620        8.0769
22            #3.9624        7.9849
25            #3.9169        8.0956
28            #4.0359        8.2084
31            #3.9816        8.0920
34            #3.9748        8.0933
37            #4.0041        8.0792
40            #3.9327        8.0656
43            #3.9239        8.1070
46            #3.9064        8.0289
49            #3.9143        8.0191
52            #4.0642        9.5547
55            #4.1251        9.8507
58            #4.2523        7.9096
61            #3.8440        7.8590
64            #3.8325        7.8398

So down to around 8 cycles per input bit.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.


More information about the gmp-devel mailing list