gcd_22
Niels Möller
nisse at lysator.liu.se
Fri Aug 16 06:32:24 UTC 2019
tg at gmplib.org (Torbjörn Granlund) writes:
> nisse at lysator.liu.se (Niels Möller) writes:
>
> Attached is a patch to move the current (slow) gcd_2 to a separate file, rename
> it mpn_gcd_22, change argument and return type, and add a basic test.
>
> Nice!
Pushed now.
> Speed support would be very welcome!
Done! To get to measure only the gcd_22 loop, I reused the speed macros
for gcd_11, but call gcd_22 (al, al, bl, bl), in effect premultiplying
both inputs with a common factor (B+1). On my machine,
$ ./tune/speed -s 1-64 -t 3 -C mpn_gcd_11 mpn_gcd_22
overhead 4.05 cycles, precision 10000 units of 5.87e-10 secs, CPU freq 1703.77 MHz
mpn_gcd_11 mpn_gcd_22
1 #9.9349 11.4688
4 #3.7217 6.5098
7 #3.6897 6.7366
10 #3.8477 7.9195
13 #4.0583 9.3407
16 #4.1460 9.9268
19 #4.4428 10.8306
22 #4.3512 10.9613
25 #4.3475 10.9706
28 #4.3407 11.3027
31 #4.3135 10.9393
34 #4.2116 11.6335
37 #4.3125 11.5507
40 #4.1918 11.9219
43 #4.2716 10.2703
46 #3.9071 10.2429
49 #3.9190 10.2698
52 #4.0637 10.3419
55 #4.0631 11.6783
58 #4.1228 11.8280
61 #4.0512 10.3701
64 #3.8345 10.3486
So the gcd_22 code with all the branches needs about 11 cycles per input
bit. gcd_11 is coreihwl/gcd_11.asm in this build.
Regards,
/Niels
--
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel
mailing list