nisse at lysator.liu.se
Mon Feb 20 10:19:03 CET 2012
I've been looking a bit on mpn_sqr_basecase.
1. It uses a temporary array, apparently for systems lacking an
mpn_sqr_diag_addlsh1 which can do in-place operation. I think one can
support arbitrary sizes with a small temporary area if one collects
the off-diagonal terms into the result area, and then computes the
diagonal products blockwise, a hundred limbs at a time or so.
Current code uses the opposite allocation, with diagonal terms in
the result area and the off-diagonal terms in the temporary array.
2. One could do the shifting differently, applying the shift to the limb
argument of addmul_1. Something like, when doing the off-diagonal
products for up[i],
mpn_addmul_1 (rp + 2*i+1, up + i + 1, n - i - 1,
(up[i] << 1) + (up[i-1] >> (GMP_LIMB_BITS - 1)));
Might be cheaper, if we can get this shifting done in parallel with
other operations, and get a simpler carry propagation recurrency when
adding diagonal and off-diagonal terms together.
3. The comments on using addmul_2 says that is is tricky. I think that's
because the diagonal terms are still 1x1 products. That would get
simpler of one treats double-limbs as the units everywhere, having
the diagonal computation form the 2x2 products
(u0, u1)^2, (u2, u3)^2, ...
The total number of limb products ought to be the same, if we do each
of these terms as
(u0 + u1 B)^2 = u0^2 + B^2 u1^2 + 2B (u0 * u1)
= u0^2 + B^2 u1^2 + B (u0 << 1) * u1 + B^2 u1 & HIGH_BIT_TO_MASK(u0)
The off-diagonal terms, to be computed with addmul_2, are then
(u0, u1) * (u2, u3, ...)
(u2, u3) * (u4, u5, ...)
I guess one can also collect the close-to-diagonal terms B u0 u1 +
B^5 u2 u3 + ..., together with the other off-diagonal terms,
One would then get the off-diagonal sum
B u0*u1 + B^2 (u0, u1) * (u2, u3, ...)
B^5 u2*u3 + B^6 (u2, u3) * (u4, u5, ...)
which begs for an mpn_addmul_1c accepting an additional carry input
limb. Is there such a function?
4. There's code to use mpn_addmul_2s. What is that function supposed to
do, is it doing the above sum?
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel