mpn_addmul_1
and mpn_submul_1
are the most important routines
for overall GMP performance. All multiplications and divisions come down to
repeated calls to these. mpn_add_n
, mpn_sub_n
,
mpn_lshift
and mpn_rshift
are next most important.
On some CPUs assembly versions of the internal functions
mpn_mul_basecase
and mpn_sqr_basecase
give significant speedups,
mainly through avoiding function call overheads. They can also potentially
make better use of a wide superscalar processor, as can bigger primitives like
mpn_addmul_2
or mpn_addmul_4
.
The restrictions on overlaps between sources and destinations
(see Low-level Functions) are designed to facilitate a variety of
implementations. For example, knowing mpn_add_n
won’t have partly
overlapping sources and destination means reading can be done far ahead of
writing on superscalar processors, and loops can be vectorized on a vector
processor, depending on the carry handling.