|
GMP assembly chart |
This is a chart with performance numbers in cycles/limb for many mpn (i.e., low-level) functions of GMP. A straight number without any special annotations means that the mpn function of that line is implemented for the CPU of that column either in the official repository or in a local repository of a maintainer. For annotated numbers, please see the table above.
To compare these numbers fairly, 32-bit machines should only be compared to
32-bit machines, and 64-bit machines should only be compared to 64-bit
machines. A 64-bit machine performs twice the amount of work for many
functions, but 4 times the work for multiply primitives, compared to 32-bit
machines.
| AMD K7 32 | Intel Nor 32 | Intel Pres 32 | Intel Doth 32 | Intel Atom 32 | AMD K8 64 | AMD K10 64 | AMD Bulld 64 | AMD Bobc 64 | Intel Noc 64 | Intel Core2 64 | Intel NHM 64 | Intel SBR 64 | Intel Atom 64 | VIA Nano 64 | PPC 74x7 32 | PPC 970 64 | IBM PWR5 64 | IBM PWR6 64 | IBM PWR7 64 | Sun US3 64 | Sun T1 64 | Alpha 21264 64 | Itanium 2 64 | ARM cor-a9 32 | ARM cor-a15 32 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| add_n | 1.64{1.5} | 4 | 4.25 | 2.14 | 3 | 1.5 | 1.5 | 1.8[1.7] | 2.5 | 4 | 2 | #2.25 | 1.61 | 3 | 3 | 4 | 2 | 2.25 | 2.63 | 2.4 | 4.5 | 17 | 2.125 | 1.25 | 2.5 | |
| sub_n | 1.64{1.5} | 4 | 4.25 | 2.14 | 3 | 1.5 | 1.5 | 1.8[1.7] | 2.5 | 4 | 2 | #2.25 | 1.61 | 3 | 3 | 4 | 2 | 2.25 | 2.63 | 2.4 | 4.5 | 17 | 2.125 | 1.25 | 2.5 | |
| addlsh1_n | 2.5 | 4.25 | 5 | 6 | 2 | 2{1.69} | 2.5{2} | 2.875 | 5.8 | 3.1 | 2.75 | 2 | 4.875 | 3 | 5 | 3 | 2.9 | 3.5 | 3 | 21 | [3.25] | 1.5 | 3.17 | |||
| sublsh1_n | 2.87 | 6.667 | 2.18 | 2.18{2} | 2.625 | 3.25 | 5.8 | 3 | 3.1{2.5} | 2.47{2.17} | 5 | 3 | 5 | 3 | 2.9 | 3.5 | 3 | 21 | (3.25) | 1.5 | 3.7 | |||||
| rsblsh1_n | 6 | 2 | 2{1.69} | 2.5{2} | 2.875 | 3.1 | 2.75 | 2 | 4.875 | 3 | [5] | 21 | ||||||||||||||
| addlsh2_n | 6 | 2.1 | 2 | 2.7{2} | 3.3 | 5.8 | 3.1 | 2.75 | 2 | 5.75 | 3 | [5] | 3 | 2.9 | 3.5 | 3 | 21 | 1.5 | ||||||||
| sublsh2_n | 7 | 5.8 | 3 | 3.1 | 2.47 | {4} | [5] | 3 | 2.9 | 3.5 | 3 | 21 | 1.5 | |||||||||||||
| rsblsh2_n | 6 | 2.1 | 2 | 2.7{2} | 3.3 | 3.1 | 2.75 | 2 | 5.75 | 3 | [5] | 21 | ||||||||||||||
| addlsh_n | 2.87 | 2.75 | 4.2{3.5} | 5.46{4.3} | 3 | 2.8 | 2.75 | 7.75{6} | 4.7{4} | (1.75) | ||||||||||||||||
| sublsh_n | {2.5-3.25} | {2.5-3.25} | {2.75} | {2.75} | {3} | {4.125} | (1.75) | |||||||||||||||||||
| rsblsh_n | 2.87 | 2.75 | 4.2{3.5} | 5.46{4.3} | 3 | 2.8 | 2.75 | 7.75{6} | 4.7{4} | (1.75) | ||||||||||||||||
| add_n_sub_n | [2.5] | [2.5] | (3) | (3) | 2.25 | |||||||||||||||||||||
| rsh1add_n | 4.5 | 5.25 | 2 | 2{1.67} | 2.75{2.5} | 3.25{2.7} | 5.63 | 3.1{2.67} | 3.3{2.5} | 2.05 | 5.25 | 3 | (5) | #4 | 3.5 | 4.5 | 3.45 | (3.5) | 1.5 | 3.64-3.7 | ||||||
| rsh1sub_n | 2 | 2{1.67} | 2.75{2.5} | 3.25{2.7} | 5.63 | 3.1{2.67} | 3.3{2.5} | 2.05 | 5.25 | 3 | (5) | #4 | 3.5 | 4.5 | 3.45 | (3.5) | 1.5 | 3.64-3.7 | ||||||||
| addcnd_n | 2.25 | 2 | 2.5 | 3.55 | 13 | 2.9 | 2.9 | 2.4 | 6.5 | 3 | 2.25 | ? | 3 | ? | 3 | |||||||||||
| subcnd_n | 2.25 | 2 | 2.5 | 3.55 | 13 | 2.9 | 2.9 | 2.4 | 6.5 | 3 | 2.25 | ? | 3 | ? | 3 | |||||||||||
| mul_1 | 3.25 | 4 | 4.5 | 4.16{3.75} | 7.5 | 2.5 | 2.5 | 4 | 5 | 12.6 | 4 | 3.75 | 2.5 | 19.75{17} | 4.25 | 6 | 7.25 | 7.25 | 13.5(8) | 2.9 | 18.25 | 68 | 2.25 | 2{1.5} | 3.25 | |
| mul_1c | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | N | Y | Y | Y | Y | Y | N | N | [Y] | |||||
| addmul_1 | 3.75 | 5{4} | 5 | 5.21{4.75} | 8 | 2.5 | 2.5 | 4.5 | 5 | 14.9 | 4.25 | 5{4} | 3.25 | 21.25{19} | 5 | 9.5 | 8 | 8 | 12.25 | 3.77 | 17.3 | 74 | 3.5 | 2(1.75) | 3.25 | |
| submul_1 | 3.75 | 6 | 6.5 | #5.5 | 8 | 2.5 | 2.5 | 4.5 | 5 | 14.9 | 4.25 | 5{4} | 3.25 | 21.25{19} | 5 | 10.5 | 8.3 | 8.25 | 12.8 | 4.9{4.3} | 22.75 | 74 | 3.5 | 2.25(2) | #5.25 | |
| mul_2 | (4) | (4) | 2.25 | 2.25 | 5{4} | 5.62{5} | 13.5[12.3] | 4 | 3.83{3.67} | 3.15 | 19.5 | 4.12 | (4.75) | (4.75) | (5.5) | (3) | 1.5 | 2.25 | ||||||||
| mul_3 | [1.333] | |||||||||||||||||||||||||
| mul_4 | [1.25] | |||||||||||||||||||||||||
| mul_5 | [1.2] | |||||||||||||||||||||||||
| mul_6 | [1.167] | |||||||||||||||||||||||||
| addmul_2 | (4) | (4) | 2.375 | 2.375 | 5.1{4} | 5.75{5} | 16[13.6] | 4.375{4} | 4.33{3.75} | 3.23 | 19.9 | 4.25 | (4.75) | (4.75) | (5.5) | (3) | 10.25 | (3) | 1.625 | 2.38 | ||||||
| addmul_3 | (4) | (4) | (4) | (3) | {1.42} | |||||||||||||||||||||
| addmul_4 | (3) | (3) | (2) | " | " | " | (2.31) | {1.3125} | ||||||||||||||||||
| addmul_6 | (1.167) | |||||||||||||||||||||||||
| mul_basecase | 3.9[3.75] | 4.6¹ | 5¹ | 5.3¹ | 8.9¹ | 2.5¹ | 2.5¹ | #5.1¹ | 5.2¹ | 15¹ | 4.5¹ | 4.3¹ | #3.45¹ | 20.5¹ | 4.5¹ | (2) | 8.38¹ | 8.3¹ | 13.4¹ | 4.02¹ | (8) | (2.31) | (1+ε) | * | ||
| mullo_basecase | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | * | |||||||||||||||
| mulmid_basecase | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | ||||||||||||||||
| mulhi_basecase | ||||||||||||||||||||||||||
| sqr_basecase | 3.9[3.75] | 5.3² | 5.6² | 6.0² | 9.7² | #3.0² | #3.0² | #5.3² | 5.5² | 15.8² | #5.1² | #4.75² | #3.73² | #21.8² | #4.75² | 8.96² | 8.67² | #18.5² | 4.35² | (8) | (1+ε) | 2.38 | ||||
| † sqr_diagonal | 4 | 2.3 | ||||||||||||||||||||||||
| sqr_diag_addlsh1 | 2 | |||||||||||||||||||||||||
| redc_1 | 2.5 | 2.5 | * | * | * | * | * | |||||||||||||||||||
| redc_2 | {2.375} | {2.375} | * | * | ||||||||||||||||||||||
| lshift | 1.2 | 1.75 | 2 | 1.75{1.46} | 5 | #2.35 | 1.8{1.3} | 1.9{1.3} | 3.5{3} | 3.33{2.7} | 1.27 | 1.375[1.25] | 1.3 | 4.5(2.5) | 3.25[2] | 2.25(1) | 2.33 | 2.25 | 4 | 2.15 | 2.5 | 17 | 1.75 | 1 | #3.5 | |
| rshift | 1.2 | 1.75 | 2 | 1.75{1.46} | 5 | #2.35 | 1.8{1.3} | 1.9{1.3} | 3.5{3} | 3.33{2.7} | 1.27 | 1.375[1.25] | 1.3 | 4.5(2.5) | 3.25{2} | 2.25(1) | 2.33 | 2.25 | 3.5 | 2.15 | 2.5 | 17 | 1.75 | 1 | #3.5 | |
| lshiftc | * | * | * | * | 5.5 | 2.75 | 2{1.5} | 1.9{1.5} | 4{3.7} | 4.15{3.5} | 1.5 | 1.75 | 1.45 | 5(3) | 3.5{2.5} | 2.25 | 2.33 | 2.25 | 4 | 2.15 | 2.67 | 17 | * | 1.25 | ||
| copyd | 0.75-1 | #2 | #2 | 0.73{0.5} | 1.75{0.5} | 1 | 1[0.85] | 1.36 | 1.5 | 2.8[2.3] | 0.52-0.8 | 0.52-0.64 | 0.52 | 1.16-1.66 | 1.1 | 0.75 | #1 | 1.13 | 1.9{1} | 1.4 | 2.5 | 17 | 1 | 0.5 | 1.5 | |
| copyi | 0.75-1 | #1 | #1.5 | 0.73{0.5} | 1.75{0.5} | 1 | 1[0.85] | 1.36 | 1.5 | 2.8[2.3] | 0.52-0.8 | 0.52-0.64 | 0.54 | 1.16-1.66 | 1.1 | 0.75 | #1 | 1 | 2{1} | 1.4 | 2.5 | 17 | 1 | 0.5 | 1.5 | |
| com | 1 | 1.25 | 1.18[0.85] | 1.6[0.9] | 1.75 | 2.8[2.3] | 1.05 | 1.5[0.5] | 1.25[0.5] | 2.75 | 2[1.1] | (0.75) | 1.62 | 1.425 | 3.5 | 1.45 | 1.5 | (0.5) | 2 | |||||||
| and_n | {1.5} | 3 | 1.5 | 1.5\2 | 1.65 | 2.67 | 2.75 | 2 | 2 | 1.5 | 3.75 | 3 | 1.14 | 2 | 2 | 2.5 | 1.75 | (1.75) | 1 | 2.5-2.75 | ||||||
| ior_n | {1.5} | 3 | 1.5 | 1.5\2 | 1.65 | 2.67 | 2.75 | 2 | 2 | 1.5 | 3.75 | 3 | 1.14 | 2 | 2 | 2.5 | 1.75 | (1.75) | 1 | 2.5-2.75 | ||||||
| xor_n | {1.5} | 3 | 1.5 | 1.5\2 | 1.65 | 2.67 | 2.75 | 2 | 2 | 1.5 | 3.75 | 3 | 1.14 | 2 | 2 | 2.5 | 1.75 | (1.75) | 1 | 2.5-2.75 | ||||||
| andn_n | {1.75} | 3.5 | 1.5\2.5 | 1.5\2 | 1.9 | 2.5-2.75 | 3.35 | 2 | 2 | 1.75 | 3.75 | 3 | 1.14 | 2 | 2 | 2.5 | 1.75 | (1.75) | 1 | 2.5-2.75 | ||||||
| iorn_n | {1.75} | 3.5 | 1.5\2.5 | 1.5\2 | 1.9 | 2.5-2.75 | 3.35 | 2 | 2 | 1.75 | 3.75 | 3 | 1.39 | 2 | 2 | 2.5 | 1.75 | (1.75) | 1 | 2.75-3 | ||||||
| xnor_n | {1.75} | 3.5 | 1.5\2.5 | 1.5\2 | 1.9 | 2.5 | 3.35 | 2 | 2 | 1.75 | 3.75 | 3 | 1.39 | 2 | 2 | 2.5 | 1.75 | (1.75) | 1 | 2.75 | ||||||
| nand_n | {1.75} | 3.5 | 1.5\1.75 | 1.5\2 | 2 | 2.5 | 3.6 | 2 | 2 | 1.75 | 3.75 | 3 | 1.39 | 2 | 2 | 2.5 | 1.75 | (2) | 1 | 2.75 | ||||||
| nior_n | {1.75} | 3.5 | 1.5\1.75 | 1.5\2 | 2 | 2.5 | 3.6 | 2 | 2 | 1.75 | 3.75 | 3 | 1.14 | 2 | 2 | 2.5 | 1.75 | (2) | 1 | 2.75 | ||||||
| † divrem_1 int | 17[14] | 32 | 34 | 24[19] | 38[25-28] | 13 | 13 | 20-20.7 | 17-18 | 44 | 24 | 19 | 15[14] | 46 | 24 | [21] | 29 | 29 | 58(52) | 25 | [22] | 30[22] | 13-14 | |||
| † divrem_1 frc | 15[13] | 30 | 32 | 17[15] | 23[22] | 12 | 12 | 18 | 16 | 42 | 19 | 18 | 12.4 | 36 | 22.6 | [7] | 19 | 19 | 41 | 14 | [18] | 30[22] | 13 | |||
| † pre_divrem_1 | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | * | * | * | Y | Y | |||||||
| div_qr_1u_pi2 | {9} | {9} | {13} | {14} | {34} | {13.5} | {11.5} | {9.5} | {14.5} | |||||||||||||||||
| div_qr_1n_pi2 | {7.5} | {7.5} | {11} | {13} | {31} | {12.5} | {10.5} | {7.5} | {13.5} | {22} | {23.5} | {38} | [16] | |||||||||||||
| † divrem_2 | 22 | 63 | 70 | 29 | 44 | 18 | 18 | 26.8 | 27 | 68 | 34 | 30.25 | 21.3 | 73 | 33 | 29 | 40 | 37 | 62(55) | 30.5 | 29 | 29 | ||||
| div_qr_2n_pi2 | {13.5} | {13.5} | {20} | {22} | {47} | {23} | {18} | {13.5} | ? | {21} | ||||||||||||||||
| † dive_1 | 11 | 19 | 21 | 11 | 16-20 | 10 | 10 | 14 | 15 | 33 | #13.25 | 14 | 8.5 | 36 | 18 | [6-8] | 16 | 16 | 46(39) | 12 | 15 | 8 | ||||
| bdiv_qr_1_pi2 | [8] | [8] | {12} | {12.4} | [24.7] | [13.4] | [12.7] | [7] | [15] | |||||||||||||||||
| † mode1o | 11 | 19 | 21 | 11 | 15 | 10 | 10 | 14 | 15 | 33 | 13 | 14.25 | 8.2 | 35 | 18 | #8-10 | 16 | 16 | 35 | 12 | 15 | 8 | 9 | |||
| diveby3 | 6 | |||||||||||||||||||||||||
| bdiv_dbm1c | 3.5 | 13.5{7} | 11 | 5 | 8 | 2.25 | 2.25 | 4.6 | 6.22 | 12.5 | 4 | 3.75 | 3.6 | 20 | 4 | 6.25 | 8.25 | 8.63 | 15 | 4.7 | 3 | 2 | 4.25 | |||
| mod_1_1p | 7 | 16 | 18 | 10 | 17 | 6 | 6 | 10{8.25} | 9 | 26 | 12.5{10.5} | 11{10.5} | 8.4[8] | 26 | 13 | 17 | 16 | 30 | 10.2 | (9) | [7] | |||||
| mod_1s_2p | 4 | 4 | 7{6.3} | 8.61 | 19 | 8 | 6.5{6} | 4.5{4} | 7.65 | (4.5) | 4.25 | |||||||||||||||
| mod_1s_3p | {3} | {3} | {5.5} | {8} | {16} | {5.41} | {4.5} | {3} | {5} | |||||||||||||||||
| mod_1s_4p | 4.75{4.25} | 4 | 4.5 | 3.4 | 8.75 | 3{2.75} | 3{2.75} | 5.7{5} | 7.67 | 15.75 | 5 | 4[3.75] | 3.25{2.5} | 23 | 4.75{4.17} | [6.5] | 9 | 9 | 13 | 3.5 | 3 | (2.25) | ||||
| mod_34lsub1 | #1 | 1.25 | 1.25 | #1.9 | 2.33 | 0.67 | 0.67 | 1{0.5} | 1.125 | 3.2 | 1.25 | 1.15 | 0.93 | 2.45 | 1.25 | 0.87 | 1.5 | 1.32 | 2.35 | 1 | #1.67 | 1 | 1.33 | |||
| gcd_1 | 5.31/b | [10/b] | [10/b] | 5.09/b | [8.9/b] | 5.21/b | 4.30/b | 5.00/b | 6.71/b | 13.5/b | 3.83/b | 5.17/b | 4.69/b | 8.77/b | 5.44/b | 5.00/b | 12.8/b | 3.4/b | 6.35/b | 5.3/b | ||||||
| invert_limb | 41 | 48 | 48 | 63 | 64 | 135 | 69 | 55 | 44 | 130 | 78 | 32 | 86 | 86 | 170 | 66 | 71 | 56 | 43 | |||||||
| popcount | 5(4) | 3.9 | 4.25 | #4.6 | 5.5 | 6 | 1.125 | 4.4{2.5} | 6.1 | 8 | 3.67{3} | 1.25 | 1.5{1} | 10.75 | 6.5{5} | 1.125 | 2.25 | {2.16} | 2 | #1.5 | 1 | |||||
| hamdist | 6(5) | {5.4} | {5.4} | 6.08 | 8 | 7 | 2{1.5} | 4.5(3) | 7.5 | 14.3{10} | 8(4) | 2{1.5} | 2{1.5} | 17.5(12) | 10.4(6) | (1.5) | (3) | 2.87 | #2.4 | 1 | ||||||
| AMD K7 32 | Intel Nor 32 | Intel Pres 32 | Intel Doth 32 | Intel Atom 32 | AMD K8 64 | AMD K10 64 | AMD Bulld 64 | AMD Bobc 64 | Intel Noc 64 | Intel Core2 64 | Intel NHM 64 | Intel SBR 64 | Intel Atom 64 | VIA Nano 64 | PPC 74x7 32 | PPC 970 64 | IBM PWR5 64 | IBM PWR6 64 | IBM PWR7 64 | Sun US3 64 | Sun T1 64 | Alpha 21264 64 | Itanium 2 64 | ARM cor-a9 32 | ARM cor-a15 32 |
¹ This value is for sizes around MUL_TOOM22_THRESHOLD, since mpn_mul_basecase is in most cases not used above that.
² This value is for sizes around SQR_TOOM2_THRESHOLD, since mpn_sqr_basecase is never used above that.
† Obsolete function that will be replaced in the next major GMP release.
| Please send comments about this page to gmp-discuss at gmplib.org |
| Copyright 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011 Free Software Foundation |
| Verbatim copying and distribution of this entire article is permitted in any medium, provided this notice is preserved. |