15.8.2 Assembly Basics

mpn_addmul_1 and mpn_submul_1 are the most important routines for overall GMP performance. All multiplications and divisions come down to repeated calls to these. mpn_add_n, mpn_sub_n, mpn_lshift and mpn_rshift are next most important.

On some CPUs assembly versions of the internal functions mpn_mul_basecase and mpn_sqr_basecase give significant speedups, mainly through avoiding function call overheads. They can also potentially make better use of a wide superscalar processor, as can bigger primitives like mpn_addmul_2 or mpn_addmul_4.

The restrictions on overlaps between sources and destinations (see Low-level Functions) are designed to facilitate a variety of implementations. For example, knowing mpn_add_n won’t have partly overlapping sources and destination means reading can be done far ahead of writing on superscalar processors, and loops can be vectorized on a vector processor, depending on the carry handling.