#### 15.1.1 Basecase Multiplication

Basecase NxM multiplication is a straightforward rectangular set of
cross-products, the same as long multiplication done by hand and for that
reason sometimes known as the schoolbook or grammar school method. This is an
*O(N*M)* algorithm. See Knuth section 4.3.1 algorithm M
(see References), and the `mpn/generic/mul_basecase.c` code.

Assembly implementations of `mpn_mul_basecase`

are essentially the same
as the generic C code, but have all the usual assembly tricks and
obscurities introduced for speed.

A square can be done in roughly half the time of a multiply, by using the fact
that the cross products above and below the diagonal are the same. A triangle
of products below the diagonal is formed, doubled (left shift by one bit), and
then the products on the diagonal added. This can be seen in
`mpn/generic/sqr_basecase.c`. Again the assembly implementations take
essentially the same approach.

u0 u1 u2 u3 u4
+---+---+---+---+---+
u0 | d | | | | |
+---+---+---+---+---+
u1 | | d | | | |
+---+---+---+---+---+
u2 | | | d | | |
+---+---+---+---+---+
u3 | | | | d | |
+---+---+---+---+---+
u4 | | | | | d |
+---+---+---+---+---+

In practice squaring isn’t a full 2x faster than multiplying, it’s
usually around 1.5x. Less than 1.5x probably indicates
`mpn_sqr_basecase`

wants improving on that CPU.

On some CPUs `mpn_mul_basecase`

can be faster than the generic C
`mpn_sqr_basecase`

on some small sizes. `SQR_BASECASE_THRESHOLD`

is
the size at which to use `mpn_sqr_basecase`

, this will be zero if that
routine should be used always.