|
GMP assembly chart |
This is a chart with performance numbers in cycles/limb for many mpn (i.e., low-level) functions of GMP. A straight number without any special annotations means that the mpn function of that line is implemented for the CPU of that column either in the official repository or in a local repository of a maintainer. For annotated numbers, please see the table above.
To compare these numbers fairly, 32-bit machines should only be compared to
32-bit machines, and 64-bit machines should only be compared to 64-bit
machines. A 64-bit machine performs twice the amount of work for many
functions, but 4 times the work for multiply primitives, compared to 32-bit
machines.
| AMD K7 32 | Intel Nor 32 | Intel Pres 32 | Intel Doth 32 | Intel Atom 32 | AMD K8 64 | AMD K10 64 | AMD Bulld 64 | AMD Bobc 64 | Intel Noc 64 | Intel Core2 64 | Intel NHM 64 | Intel SBR 64 | Intel Atom 64 | VIA Nano 64 | PPC 74x7 32 | PPC 970 64 | IBM PWR5 64 | IBM PWR6 64 | IBM PWR7 64 | Sun US3 64 | Sun T1 64 | Alpha 21264 64 | Itanium 2 64 | S/390 z990 64 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| add_n | 1.64{1.5} | 4 | 4.25 | 2.14 | 3 | 1.5 | 1.5 | 2.5 | 4 | 2 | #2.25 | 1.61 | 3 | 3 | 4 | 2 | 2.25 | 2.63 | 2.4 | 4.5 | 17 | 2.125 | 1.25 | 3[2.5] | |
| sub_n | 1.64{1.5} | 4 | 4.25 | 2.14 | 3 | 1.5 | 1.5 | 2.5 | 4 | 2 | #2.25 | 1.61 | 3 | 3 | 4 | 2 | 2.25 | 2.63 | 2.4 | 4.5 | 17 | 2.125 | 1.25 | 3[2.5] | |
| addlsh1_n | 2.5 | 4.25 | 5 | 6 | 2 | 2{1.69} | 2.875 | 5.8 | 3.1 | 2.75 | 2 | 4.875 | 3 | 5 | 3 | 2.9 | 3.5 | 3 | 21 | [3.25] | 1.5 | 4.75{3.87} | |||
| sublsh1_n | 2.87 | 6.667 | 2.18 | 2.18{2} | 3.25 | 5.8 | 3 | 3.1{2.5} | 2.47{2.17} | 5 | 3 | 5 | 3 | 2.9 | 3.5 | 3 | 21 | (3.25) | 1.5 | 5 | |||||
| rsblsh1_n | 6 | 2 | 2{1.69} | 2.875 | 3.1 | 2.75 | 2 | 4.875 | 3 | [5] | 21 | 4.75{3.87} | |||||||||||||
| addlsh2_n | 6 | 2.1 | 2 | 3.3 | 5.8 | 3.1 | 2.75 | 2 | 5.75 | 3 | [5] | 3 | 2.9 | 3.5 | 3 | 21 | 1.5 | ||||||||
| sublsh2_n | 7 | 5.8 | 3 | 3.1 | 2.47 | {4} | [5] | 3 | 2.9 | 3.5 | 3 | 21 | 1.5 | ||||||||||||
| rsblsh2_n | 6 | 2.1 | 2 | 3.3 | 3.1 | 2.75 | 2 | 5.75 | 3 | [5] | 21 | ||||||||||||||
| addlsh_n | 2.87 | 2.75 | 5.46 | 3 | 2.8 | 2.75 | 7.75{6} | 4.7{4} | (1.75) | ||||||||||||||||
| sublsh_n | {2.5-3.25} | {2.5-3.25} | {2.75} | {2.75} | {3} | {4.125} | (1.75) | ||||||||||||||||||
| rsblsh_n | 2.87 | 2.75 | 5.46 | 3 | 2.8 | 2.75 | 7.75{6} | 4.7{4} | (1.75) | ||||||||||||||||
| add_n_sub_n | [2.5] | [2.5] | (3) | (3) | 2.25 | ||||||||||||||||||||
| rsh1add_n | 4.5 | 5.25 | 2 | 2{1.67} | 3.25 | 5.63 | 3.2{2.67} | 3.87[3.3]{2.5} | 2.05 | 5.25 | 3 | (5) | #4 | 3.5 | 4.5 | 3.45 | (3.5) | 1.5 | |||||||
| rsh1sub_n | 2 | 2{1.67} | 3.25 | 5.63 | 3.2{2.67} | 3.87[3.3]{2.5} | 2.05 | 5.25 | 3 | (5) | #4 | 3.5 | 4.5 | 3.45 | (3.5) | 1.5 | |||||||||
| mul_1 | 3.25 | 4 | 4.5 | 4.16{3.75} | 7.5 | 2.5 | 2.5 | 5.5 | 12.6 | 4 | 3.75 | 2.9{2.5} | 19.75{17} | 4.25 | 6 | 7.25 | 7.25 | 13.5(8) | 2.9 | 18.25 | 68 | 2.25 | 2{1.5} | 22[20] | |
| mul_1c | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | N | N | [Y] | |||||
| addmul_1 | 3.75 | 5{4} | 5 | 5.21{4.75} | 8 | 2.5 | 2.5 | 6.17 | 14.9 | 4.25 | 5{4} | 4{3.25} | 21.25{19} | 5 | 9.5 | 8 | 8 | 12.25 | 3.77 | 17.3 | 74 | 3.5 | 2(1.75) | 23[21] | |
| submul_1 | 3.75 | 6 | 6.5 | #5.5 | 8 | 2.5 | 2.5 | 6.17 | 14.9 | 4.25 | 5{4} | 4{3.25} | 21.25{19} | 5 | 10.5 | 8.3 | 8.25 | 12.8 | 4.9 | 22.75 | 74 | 3.5 | 2.25(2) | 24[21.75] | |
| mul_2 | (4) | (4) | 2.25 | 2.25 | 5.625 | [12.3] | 4 | 3.83{3.67} | 3.15 | 19.5 | 4.12 | (4.75) | (4.75) | (5.5) | (3) | 1.5 | |||||||||
| mul_3 | [1.333] | ||||||||||||||||||||||||
| mul_4 | [1.25] | ||||||||||||||||||||||||
| mul_5 | [1.2] | ||||||||||||||||||||||||
| mul_6 | [1.167] | ||||||||||||||||||||||||
| addmul_2 | (4) | (4) | 2.375 | 2.375 | 5.75 | 16[13.6] | 4.375 | 4.33{3.75} | 3.41 | 19.9 | 4.25 | (4.75) | (4.75) | (5.5) | (3) | 10.25 | (3) | 1.625 | [20.7] | ||||||
| addmul_3 | (4) | (4) | (4) | (3) | {1.42} | ||||||||||||||||||||
| addmul_4 | (3) | (3) | (2) | " | " | " | (2.31) | {1.3125} | |||||||||||||||||
| addmul_6 | (1.167) | ||||||||||||||||||||||||
| mul_basecase | 3.9[3.75] | 4.6¹ | 5¹ | 5.3¹ | 8.9¹ | 2.5¹ | 2.5¹ | 15¹ | 4.5¹ | 4.3¹ | 3.45¹ | 20.5¹ | 4.5¹ | (2) | 8.38¹ | 8.3¹ | 13.4¹ | 4.02¹ | (8) | (2.31) | (1+ε) | 24.2¹ | |||
| sqr_basecase | 3.9[3.75] | 5.3² | 5.6² | 6.0² | 9.7² | #3.0² | #3.0² | #15.8² | #5.1² | #4.75² | 3.73² | #21.8² | #4.75² | 8.96² | 8.67² | #18.5² | 4.35² | (8) | (1+ε) | 24.8¹ | |||||
| † sqr_diagonal | 4 | 2.3 | |||||||||||||||||||||||
| sqr_diag_addlsh1 | 2 | ||||||||||||||||||||||||
| redc_1 | 2.5 | 2.5 | * | * | * | * | |||||||||||||||||||
| redc_2 | {2.375} | {2.375} | * | ||||||||||||||||||||||
| lshift | 1.2 | 1.75 | 2 | 1.75{1.46} | 5 | #2.35 | 2.35{1.5} | 3.5 | 3.33 | #1.27 | 1.375[1.25] | 1.87 | 4.5 | 3.25{2} | 2.25(1) | 2.33 | 2.25 | 4 | 2.15 | 2.5 | 17 | 1.75 | 1 | 3 | |
| rshift | 1.2 | 1.75 | 2 | 1.75{1.46} | 5 | #2.35 | 2.35{1.5} | 3.5 | 3.33 | #1.27 | 1.375 | 1.77 | 4.5 | 3.25{2} | 2.25(1) | 2.33 | 2.25 | 3.5 | 2.15 | 2.5 | 17 | 1.75 | 1 | 3 | |
| lshiftc | * | * | * | * | 5.5 | 2.75 | 2.75(1.75) | 4 | 4.15 | 1.5 | 1.75 | 2.25 | 5 | 3.5(2) | 2.25 | 2.33 | 2.25 | 4 | 2.15 | 2.67 | 17 | * | 1.25 | 3.5 | |
| copyd | 0.75-1 | #2 | #2 | 0.73{0.5} | 1.75 | 1 | 1 | 1.75 | 2.8 | 1 | 1{0.5} | 1{0.5} | 2 | 2{1} | 0.75 | #1 | 1.13 | 1.9{1} | 1.4 | 2.5 | 17 | 1 | 0.5 | 1.5 | |
| copyi | 0.75-1 | #1 | #1.5 | 0.73{0.5} | 1.75 | 1.25 | 1 | 1.75 | 2.8 | 1 | 1{0.5} | 1{0.5} | 2 | 2{1} | 0.75 | #1 | 1 | 2{1} | 1.4 | 2.5 | 17 | 1 | 0.5 | 0.75 | |
| com | 1 | 1.25 | 1.18 | 1.75 | 2.75 | 1.05 | 1.5 | 1.25 | 2.75 | 2 | (0.75) | 1.62 | 1.425 | 3.5 | 1.45 | 1.5 | (0.5) | ||||||||
| and_n | {1.5} | 3 | 1.5 | 1.5\2 | 2.67 | 2.75 | 2 | 2 | 1.5 | 3.75 | 3 | 1.14 | 2 | 2 | 2.5 | 1.75 | (1.75) | 1 | 2.75 | ||||||
| ior_n | {1.5} | 3 | 1.5 | 1.5\2 | 2.67 | 2.75 | 2 | 2 | 1.5 | 3.75 | 3 | 1.14 | 2 | 2 | 2.5 | 1.75 | (1.75) | 1 | 2.75 | ||||||
| xor_n | {1.5} | 3 | 1.5 | 1.5\2 | 2.67 | 2.75 | 2 | 2 | 1.5 | 3.75 | 3 | 1.14 | 2 | 2 | 2.5 | 1.75 | (1.75) | 1 | 2.75 | ||||||
| andn_n | {1.75} | 3.5 | 1.5\2.5 | 1.5\2 | 2.5-2.75 | 3.35 | 2 | 2 | 1.75 | 3.75 | 3 | 1.14 | 2 | 2 | 2.5 | 1.75 | (1.75) | 1 | 3.25 | ||||||
| iorn_n | {1.75} | 3.5 | 1.5\2.5 | 1.5\2 | 2.5-2.75 | 3.35 | 2 | 2 | 1.75 | 3.75 | 3 | 1.39 | 2 | 2 | 2.5 | 1.75 | (1.75) | 1 | 3.25 | ||||||
| xnor_n | {1.75} | 3.5 | 1.5\2.5 | 1.5\2 | 2.5 | 3.35 | 2 | 2 | 1.75 | 3.75 | 3 | 1.39 | 2 | 2 | 2.5 | 1.75 | (1.75) | 1 | 3.25 | ||||||
| nand_n | {1.75} | 3.5 | 1.5\1.75 | 1.5\2 | 2.5 | 3.6 | 2 | 2 | 1.75 | 3.75 | 3 | 1.39 | 2 | 2 | 2.5 | 1.75 | (2) | 1 | 3.25 | ||||||
| nior_n | {1.75} | 3.5 | 1.5\1.75 | 1.5\2 | 2.5 | 3.6 | 2 | 2 | 1.75 | 3.75 | 3 | 1.14 | 2 | 2 | 2.5 | 1.75 | (2) | 1 | 3.25 | ||||||
| † divrem_1 int | 17[14] | 32 | 34 | 24[19] | 38[25-28] | 13 | 13 | 17-18 | 44 | 24 | 19 | 15[14] | 46 | 24 | [21] | 29 | 29 | 58(52) | 25 | [22] | 30[22] | ||||
| † divrem_1 frc | 15[13] | 30 | 32 | 17[15] | 23[22] | 12 | 12 | 16 | 42 | 19 | 18 | 12.4 | 36 | 22.6 | [7] | 19 | 19 | 41 | 14 | [18] | 30[22] | ||||
| † pre_divrem_1 | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | * | * | * | Y | ||||||||
| div_qr_1u_pi2 | {9} | {9} | {34} | {13} | {11.5} | {9.5} | {14.5} | ||||||||||||||||||
| div_qr_1n_pi2 | {7.5} | {7.5} | {31} | {13} | {10.5} | {7.5} | {13.5} | {22} | {23.5} | {38} | [16] | ||||||||||||||
| † divrem_2 | 22 | 63 | 70 | 29 | 44 | 18 | 18 | 27 | 68 | 34 | 30.25 | 21.3 | 73 | 33 | 29 | 40 | 37 | 62(55) | 30.5 | 29 | 29 | ||||
| div_qr_2n_pi2 | {13.5} | {13.5} | {47} | {23} | {18} | {13.5} | ? | {21} | |||||||||||||||||
| † dive_1 | 11 | 19 | 21 | 11 | 16-20 | 10 | 10 | 15 | 33 | #13.25 | 14 | 8.5 | 36 | 18 | [6-8] | 16 | 16 | 46(39) | 12 | 15 | 8 | ||||
| bdiv_qr_1_pi2 | [8] | [8] | [24.7] | [13.4] | [12.7] | [7] | [15] | ||||||||||||||||||
| † mode1o | 11 | 19 | 21 | 11 | 15 | 10 | 10 | 15 | 33 | 13 | 14.25 | 8.2 | 35 | 18 | #8-10 | 16 | 16 | 35 | 12 | 15 | 8 | ||||
| diveby3 | 6 | ||||||||||||||||||||||||
| bdiv_dbm1c | 3.5 | 13.5{7} | 11 | 5 | 8 | 2.25 | 2.25 | 6.22 | 12.5 | 4 | 3.75 | 3.6 | 20 | 4 | 6.25 | 8.25 | 8.63 | 15 | 4.7 | 3 | 2 | 22{20} | |||
| mod_1_1p | 7 | 16 | 18 | 10 | 17 | 6 | 6 | 9 | 26 | 12.5{10.5} | 11{10.5} | 8.4[8] | 26 | 13 | 17 | 16 | 30 | 10.2 | (9) | ||||||
| mod_1s_2p | 4 | 4 | 8.61 | 19 | 8 | 6.5 | 4.5 | 7.65 | (4.5) | ||||||||||||||||
| mod_1s_3p | {3} | {3} | 8 | {16} | {5.41} | {4.5} | {3} | {5} | |||||||||||||||||
| mod_1s_4p | 4.75{4.25} | 4 | 4.5 | 3.4 | 8.75 | 3{2.75} | 3{2.75} | 7.67 | 15.75 | 5 | 4[3.75] | 3.25{2.5} | 23 | 4.75{4.17} | [6.5] | 9 | 9 | 13 | 3.5 | 3 | (2.25) | ||||
| mod_34lsub1 | #1 | 1.25 | 1.25 | #1.9 | 2.33 | 0.67 | 0.67 | 1.125 | 3.2 | 1.25 | 1.15 | 0.93 | 2.45 | 1.25 | 0.87 | 1.5 | 1.32 | 2.35 | 1 | #1.67 | 1 | 2{1+ε} | |||
| gcd_1 | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | ||||||||||||||
| invert_limb | 41 | 48 | 48 | 64 | 135 | 69 | 55 | 44 | 130 | 78 | 32 | 86 | 86 | 170 | 66 | 71 | 56 | 86 | |||||||
| popcount | 5(4) | 3.9 | 4.25 | #4.6 | 5.5 | 6 | 1.125 | 6.1 | 8 | 3.67{3} | 1.25 | 1.5 | 10.75 | 6.5{5} | 1.125 | 2.25 | #1.5 | 1 | |||||||
| hamdist | 6(5) | {5.4} | {5.4} | 6.08 | 8 | 7 | 2{1.5} | 7.5 | 14.3{10} | 8(4) | 2{1.5} | 2 | 17.5(12) | 10.4(6) | (1.5) | (3) | #2.4 | 1 | |||||||
| AMD K7 32 | Intel Nor 32 | Intel Pres 32 | Intel Doth 32 | Intel Atom 32 | AMD K8 64 | AMD K10 64 | AMD Bulld 64 | AMD Bobc 64 | Intel Noc 64 | Intel Core2 64 | Intel NHM 64 | Intel SBR 64 | Intel Atom 64 | VIA Nano 64 | PPC 74x7 32 | PPC 970 64 | IBM PWR5 64 | IBM PWR6 64 | IBM PWR7 64 | Sun US3 64 | Sun T1 64 | Alpha 21264 64 | Itanium 2 64 | S/390 z990 64 |
¹ This value is for sizes around MUL_TOOM22_THRESHOLD, since mpn_mul_basecase is in most cases not used above that.
² This value is for sizes around SQR_TOOM2_THRESHOLD, since mpn_sqr_basecase is never used above that.
† Obsolete function that will be replaced in the next major GMP release.
| Please send comments about this page to gmp-discuss at gmplib.org |
| Copyright 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011 Free Software Foundation |
| Verbatim copying and distribution of this entire article is permitted in any medium, provided this notice is preserved. |