Niels Möller nisse at lysator.liu.se
Mon Feb 20 10:19:03 CET 2012

```I've been looking a bit on mpn_sqr_basecase.

1. It uses a temporary array, apparently for systems lacking an
mpn_sqr_diag_addlsh1 which can do in-place operation. I think one can
support arbitrary sizes with a small temporary area if one collects
the off-diagonal terms into the result area, and then computes the
diagonal products blockwise, a hundred limbs at a time or so.

Current code uses the opposite allocation, with diagonal terms in
the result area and the off-diagonal terms in the temporary array.

2. One could do the shifting differently, applying the shift to the limb
argument of addmul_1. Something like, when doing the off-diagonal
products for up[i],

mpn_addmul_1 (rp + 2*i+1, up + i + 1, n - i - 1,
(up[i] << 1) + (up[i-1] >> (GMP_LIMB_BITS - 1)));

Might be cheaper, if we can get this shifting done in parallel with
other operations, and get a simpler carry propagation recurrency when
adding diagonal and off-diagonal terms together.

3. The comments on using addmul_2 says that is is tricky. I think that's
because the diagonal terms are still 1x1 products. That would get
simpler of one treats double-limbs as the units everywhere, having
the diagonal computation form the 2x2 products

(u0, u1)^2, (u2, u3)^2, ...

The total number of limb products ought to be the same, if we do each
of these terms as

(u0 + u1 B)^2 = u0^2 + B^2 u1^2 + 2B (u0 * u1)
= u0^2 + B^2 u1^2 + B (u0 << 1) * u1 + B^2 u1 & HIGH_BIT_TO_MASK(u0)

The off-diagonal terms, to be computed with addmul_2, are then

(u0, u1) * (u2, u3, ...)
(u2, u3) * (u4, u5, ...)
...

I guess one can also collect the close-to-diagonal terms B u0 u1 +
B^5 u2 u3 + ..., together with the other off-diagonal terms,
One would then get the off-diagonal sum

B u0*u1 + B^2 (u0, u1) * (u2, u3, ...)
B^5 u2*u3 + B^6 (u2, u3) * (u4, u5, ...)

limb. Is there such a function?

4. There's code to use mpn_addmul_2s. What is that function supposed to
do, is it doing the above sum?

Regards,
/Niels

--
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.

```