GMP «Arithmetic without limitations»
symbol meaning
* no estimate, but write!
( ) estimate, not tested
{ } known speed, tested
[ ] more or less ready to check in
15 slow code that is disabled by means of gmp-mparam.h
# eligible for replacement
m - n m c/l to n c/l depending on operand properties (e.g., overlap)
m \ n m c/l sometimes degenerating to n c/l
GMP assembly chart


This is a chart with performance numbers in cycles/limb for many mpn (i.e., low-level) functions of GMP. A straight number without any special annotations means that the mpn function of that line is implemented for the CPU of that column either in the official repository or in a local repository of a maintainer. For annotated numbers, please see the table above.

To compare these numbers fairly, 32-bit machines should only be compared to 32-bit machines, and 64-bit machines should only be compared to 64-bit machines. A 64-bit machine performs twice the amount of work for many functions, but 4 times the work for multiply primitives, compared to 32-bit machines.

AMD
K7
32
Intel
Nor
32
Intel
Pres
32
Intel
Doth
32
Intel
Atom
32
AMD
K8
64
AMD
K10
64
AMD
Bulld
64
AMD
Bobc
64
Intel
Noc
64
Intel
Core2
64
Intel
NHM
64
Intel
SBR
64
Intel
Atom
64
VIA
Nano
64
PPC
74x7
32
PPC
970
64
IBM
PWR5
64
IBM
PWR6
64
IBM
PWR7
64
Sun
US3
64
Sun
T1
64
Alpha
21264
64
Itanium
2
64
S/390
z990
64
add_n 1.64{1.5} 4 4.25 2.14 3 1.5 1.5 2.5 4 2 #2.25 1.61 3 3 4 2 2.25 2.63 2.4 4.5 17 2.125 1.25 3[2.5]
sub_n 1.64{1.5} 4 4.25 2.14 3 1.5 1.5 2.5 4 2 #2.25 1.61 3 3 4 2 2.25 2.63 2.4 4.5 17 2.125 1.25 3[2.5]
addlsh1_n 2.5 4.25 5 6 2 2{1.69} 2.875 5.8 3.1 2.75 2 4.875 3 5 3 2.9 3.5 3 21 [3.25] 1.5 4.75{3.87}
sublsh1_n 2.87 6.667 2.18 2.18{2} 3.25 5.8 3 3.1{2.5}2.47{2.17}5 3 5 3 2.9 3.5 3 21 (3.25) 1.5 5
rsblsh1_n 6 2 2{1.69} 2.875 13.753.12.752 4.875 3 [5] 21 4.75{3.87}
addlsh2_n 6 2.1 2 3.3 5.8 3.1 2.75 2 5.75 3 [5] 3 2.9 3.5 3 21 1.5
sublsh2_n 7 5.8 3 3.1 2.47 {4} [5] 3 2.9 3.5 3 21 1.5
rsblsh2_n 6 2.1 2 3.3 133.12.75 2 5.75 3 [5] 21
addlsh_n 2.87 2.75 5.46 1532.8 2.75 7.75{6} 4.7{4} (1.75)
sublsh_n {2.5-3.25}{2.5-3.25} {2.75} {2.75} {3} {4.125} (1.75)
rsblsh_n 2.87 2.75 5.46 1532.8 2.75 7.75{6} 4.7{4} (1.75)
add_n_sub_n [2.5] [2.5] (3) (3) 2.25
rsh1add_n 4.5 5.25 2 2{1.67} 3.25 5.63 3.2{2.67}3.87[3.3]{2.5}2.05 5.25 3 (5) #4 3.5 4.5 3.45 (3.5) 1.5
rsh1sub_n 2 2{1.67} 3.25 5.63 3.2{2.67}3.87[3.3]{2.5}2.05 5.25 3 (5) #4 3.5 4.5 3.45 (3.5) 1.5
mul_1 3.25 4 4.5 4.16{3.75}7.5 2.5 2.5 5.5 12.6 4 3.75 2.9{2.5}19.75{17} 4.25 6 7.25 7.25 13.5(8) 2.9 18.25 68 2.25 2{1.5} 22[20]
mul_1c Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y N N [Y]
addmul_1 3.75 5{4} 5 5.21{4.75}8 2.5 2.5 6.17 14.9 4.25 5{4} 4{3.25}21.25{19} 5 9.5 8 8 12.25 3.77 17.3 74 3.5 2(1.75) 23[21]
submul_1 3.75 6 6.5 #5.5 8 2.5 2.5 6.17 14.9 4.25 5{4} 4{3.25}21.25{19} 5 10.5 8.3 8.25 12.8 4.9 22.75 74 3.5 2.25(2) 24[21.75]
mul_2 (4) (4) 2.25 2.25 5.625 [12.3] 4 3.83{3.67}3.15 19.5 4.12 (4.75) (4.75) (5.5) (3) 1.5
mul_3 [1.333]
mul_4 [1.25]
mul_5 [1.2]
mul_6 [1.167]
addmul_2 (4) (4) 2.375 2.375 5.75 16[13.6] 4.375 4.33{3.75}3.41 19.9 4.25 (4.75) (4.75) (5.5) (3) 10.25 (3) 1.625 [20.7]
addmul_3 (4) (4) (4) (3) {1.42}
addmul_4 (3) (3) (2) " " " (2.31) {1.3125}
addmul_6 (1.167)
mul_basecase 3.9[3.75] 4.6¹ 5.3¹ 8.9¹ 2.5¹ 2.5¹ 15¹ 4.5¹ 4.3¹ 3.45¹ 20.5¹ 4.5¹ (2) 8.38¹ 8.3¹ 13.4¹ 4.02¹ (8) (2.31) (1+ε) 24.2¹
sqr_basecase 3.9[3.75] 5.3² 5.6² 6.0² 9.7² #3.0² #3.0² #15.8² #5.1² #4.75² 3.73² #21.8² #4.75² 8.96² 8.67² #18.5² 4.35² (8) (1+ε) 24.8¹
† sqr_diagonal 4 2.3
sqr_diag_addlsh1 2
redc_1 2.5 2.5 * * * *
redc_2 {2.375} {2.375} *
lshift 1.2 1.75 2 1.75{1.46}5 #2.35 2.35{1.5} 3.5 3.33 #1.27 1.375[1.25]1.87 4.5 3.25{2} 2.25(1) 2.33 2.25 4 2.15 2.5 17 1.75 1 3
rshift 1.2 1.75 2 1.75{1.46}5 #2.35 2.35{1.5} 3.5 3.33 #1.27 1.375 1.77 4.5 3.25{2} 2.25(1) 2.33 2.25 3.5 2.15 2.5 17 1.75 1 3
lshiftc * * * * 5.5 2.75 2.75(1.75) 4 4.15 1.5 1.75 2.25 5 3.5(2) 2.25 2.33 2.25 4 2.15 2.67 17 * 1.25 3.5
copyd 0.75-1 #2 #2 0.73{0.5} 1.75 1 1 1.75 2.8 1 1{0.5} 1{0.5} 2 2{1} 0.75 #1 1.13 1.9{1} 1.4 2.5 17 1 0.5 1.5
copyi 0.75-1 #1 #1.5 0.73{0.5} 1.75 1.25 1 1.75 2.8 1 1{0.5} 1{0.5} 2 2{1} 0.75 #1 1 2{1} 1.4 2.5 17 1 0.5 0.75
com 1 1.25 1.18 1.75 2.75 1.05 1.5 1.25 2.75 2 (0.75) 1.62 1.425 3.5 1.45 1.5 (0.5)
and_n {1.5} 3 1.5 1.5\2 2.67 2.75 2 2 1.5 3.75 3 1.14 2 2 2.5 1.75 (1.75) 1 2.75
ior_n {1.5} 3 1.5 1.5\2 2.67 2.75 2 2 1.5 3.75 3 1.14 2 2 2.5 1.75 (1.75) 1 2.75
xor_n {1.5} 3 1.5 1.5\2 2.67 2.75 2 2 1.5 3.75 3 1.14 2 2 2.5 1.75 (1.75) 1 2.75
andn_n {1.75} 3.5 1.5\2.5 1.5\2 2.5-2.75 3.35 2 2 1.75 3.75 3 1.14 2 2 2.5 1.75 (1.75) 1 3.25
iorn_n {1.75} 3.5 1.5\2.5 1.5\2 2.5-2.75 3.35 2 2 1.75 3.75 3 1.39 2 2 2.5 1.75 (1.75) 1 3.25
xnor_n {1.75} 3.5 1.5\2.5 1.5\2 2.5 3.35 2 2 1.75 3.75 3 1.39 2 2 2.5 1.75 (1.75) 1 3.25
nand_n {1.75} 3.5 1.5\1.75 1.5\2 2.5 3.6 2 2 1.75 3.75 3 1.39 2 2 2.5 1.75 (2) 1 3.25
nior_n {1.75} 3.5 1.5\1.75 1.5\2 2.5 3.6 2 2 1.75 3.75 3 1.14 2 2 2.5 1.75 (2) 1 3.25
† divrem_1 int 17[14] 32 34 24[19] 38[25-28]13 13 17-18 44 24 19 15[14] 46 24 [21] 29 29 58(52) 25 [22] 30[22]
† divrem_1 frc 15[13] 30 32 17[15] 23[22] 12 12 16 42 19 18 12.4 36 22.6 [7] 19 19 41 14 [18] 30[22]
† pre_divrem_1 Y Y Y Y Y Y Y Y Y Y Y Y Y * * * Y
div_qr_1u_pi2 {9} {9} {34} {13} {11.5} {9.5} {14.5}
div_qr_1n_pi2 {7.5} {7.5} {31} {13} {10.5} {7.5} {13.5} {22} {23.5} {38} [16]
† divrem_2 22 63 70 29 44 18 18 27 68 34 30.25 21.3 73 33 29 40 37 62(55) 30.5 29 29
div_qr_2n_pi2 {13.5} {13.5} {47} {23} {18} {13.5} ? {21}
† dive_1 11 19 21 11 16-20 10 10 15 33 #13.25 14 8.5 36 18 [6-8] 16 16 46(39) 12 15 8
bdiv_qr_1_pi2 [8] [8] [24.7] [13.4] [12.7] [7] [15]
† mode1o 11 19 21 11 15 10 10 15 33 13 14.25 8.2 35 18 #8-10 16 16 35 12 15 8
diveby3 6
bdiv_dbm1c 3.5 13.5{7} 11 5 8 2.25 2.25 6.22 12.5 4 3.75 3.6 20 4 6.25 8.25 8.63 15 4.7 3 2 22{20}
mod_1_1p 7 16 18 10 17 6 6 9 26 12.5{10.5}11{10.5} 8.4[8] 26 13 17 16 30 10.2 (9)
mod_1s_2p 4 4 8.61 19 8 6.5 4.5287.65 (4.5)
mod_1s_3p {3} {3} 8 {16} {5.41} {4.5} {3} {5}
mod_1s_4p 4.75{4.25} 4 4.5 3.4 8.75 3{2.75} 3{2.75} 7.67 15.75 5 4[3.75]3.25{2.5}23 4.75{4.17}[6.5] 9 9 13 3.5 3 (2.25)
mod_34lsub1 #1 1.25 1.25 #1.9 2.33 0.67 0.67 1.125 3.2 1.25 1.15 0.93 2.45 1.25 0.87 1.5 1.32 2.35 1 #1.67 1 2{1+ε}
gcd_1 Y Y Y Y Y Y Y Y Y Y Y
invert_limb 41 48 48 64 135 69 55 44 130 78 32 86 86 170 66 71 56 86
popcount 5(4) 3.9 4.25 #4.6 5.5 6 1.125 6.1 8 3.67{3} 1.25 1.5 10.75 6.5{5} 1.125 2.25 #1.5 1
hamdist 6(5) {5.4} {5.4} 6.08 8 7 2{1.5} 7.5 14.3{10} 8(4) 2{1.5} 2 17.5(12) 10.4(6) (1.5) (3) #2.4 1
AMD
K7
32
Intel
Nor
32
Intel
Pres
32
Intel
Doth
32
Intel
Atom
32
AMD
K8
64
AMD
K10
64
AMD
Bulld
64
AMD
Bobc
64
Intel
Noc
64
Intel
Core2
64
Intel
NHM
64
Intel
SBR
64
Intel
Atom
64
VIA
Nano
64
PPC
74x7
32
PPC
970
64
IBM
PWR5
64
IBM
PWR6
64
IBM
PWR7
64
Sun
US3
64
Sun
T1
64
Alpha
21264
64
Itanium
2
64
S/390
z990
64

¹ This value is for sizes around MUL_TOOM22_THRESHOLD, since mpn_mul_basecase is in most cases not used above that.
² This value is for sizes around SQR_TOOM2_THRESHOLD, since mpn_sqr_basecase is never used above that.
† Obsolete function that will be replaced in the next major GMP release.



Please send comments about this page to gmp-discuss at gmplib.org
Copyright 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011 Free Software Foundation
Verbatim copying and distribution of this entire article is permitted in any medium, provided this notice is preserved.