mpn_submul_1 are the most important routines
for overall GMP performance. All multiplications and divisions come down to
repeated calls to these.
mpn_rshift are next most important.
On some CPUs assembly versions of the internal functions
mpn_sqr_basecase give significant speedups,
mainly through avoiding function call overheads. They can also potentially
make better use of a wide superscalar processor, as can bigger primitives like
The restrictions on overlaps between sources and destinations
(see Low-level Functions) are designed to facilitate a variety of
implementations. For example, knowing
mpn_add_n won’t have partly
overlapping sources and destination means reading can be done far ahead of
writing on superscalar processors, and loops can be vectorized on a vector
processor, depending on the carry handling.