Basecase NxM multiplication is a straightforward rectangular set of cross-products, the same as long multiplication done by hand and for that reason sometimes known as the schoolbook or grammar school method. This is an O(N*M) algorithm. See Knuth section 4.3.1 algorithm M (see References), and the mpn/generic/mul_basecase.c code.
Assembly implementations of mpn_mul_basecase
are essentially the same
as the generic C code, but have all the usual assembly tricks and
obscurities introduced for speed.
A square can be done in roughly half the time of a multiply, by using the fact that the cross products above and below the diagonal are the same. A triangle of products below the diagonal is formed, doubled (left shift by one bit), and then the products on the diagonal added. This can be seen in mpn/generic/sqr_basecase.c. Again the assembly implementations take essentially the same approach.
u0 u1 u2 u3 u4 +---+---+---+---+---+ u0 | d | | | | | +---+---+---+---+---+ u1 | | d | | | | +---+---+---+---+---+ u2 | | | d | | | +---+---+---+---+---+ u3 | | | | d | | +---+---+---+---+---+ u4 | | | | | d | +---+---+---+---+---+
In practice squaring isn’t a full 2x faster than multiplying, it’s
usually around 1.5x. Less than 1.5x probably indicates
mpn_sqr_basecase
wants improving on that CPU.
On some CPUs mpn_mul_basecase
can be faster than the generic C
mpn_sqr_basecase
on some small sizes. SQR_BASECASE_THRESHOLD
is
the size at which to use mpn_sqr_basecase
, this will be zero if that
routine should be used always.