About mpn_sqr_basecase

Mon Feb 20 10:19:03 CET 2012

I've been looking a bit on mpn_sqr_basecase.

1. It uses a temporary array, apparently for systems lacking an
   mpn_sqr_diag_addlsh1 which can do in-place operation. I think one can
   support arbitrary sizes with a small temporary area if one collects
   the off-diagonal terms into the result area, and then computes the
   diagonal products blockwise, a hundred limbs at a time or so.

   Current code uses the opposite allocation, with diagonal terms in
   the result area and the off-diagonal terms in the temporary array.

2. One could do the shifting differently, applying the shift to the limb
   argument of addmul_1. Something like, when doing the off-diagonal
   products for up[i],

     mpn_addmul_1 (rp + 2*i+1, up + i + 1, n - i - 1,
		   (up[i] << 1) + (up[i-1] >> (GMP_LIMB_BITS - 1)));

   Might be cheaper, if we can get this shifting done in parallel with
   other operations, and get a simpler carry propagation recurrency when
   adding diagonal and off-diagonal terms together.

3. The comments on using addmul_2 says that is is tricky. I think that's
   because the diagonal terms are still 1x1 products. That would get
   simpler of one treats double-limbs as the units everywhere, having
   the diagonal computation form the 2x2 products

     (u0, u1)^2, (u2, u3)^2, ...

   The total number of limb products ought to be the same, if we do each
   of these terms as

     (u0 + u1 B)^2 = u0^2 + B^2 u1^2 + 2B (u0 * u1)
     	           = u0^2 + B^2 u1^2 + B (u0 << 1) * u1 + B^2 u1 & HIGH_BIT_TO_MASK(u0)

   The off-diagonal terms, to be computed with addmul_2, are then

     (u0, u1) * (u2, u3, ...)
             (u2, u3) * (u4, u5, ...)
             ...

   I guess one can also collect the close-to-diagonal terms B u0 u1 +
   B^5 u2 u3 + ..., together with the other off-diagonal terms,
   One would then get the off-diagonal sum

     B u0*u1 + B^2 (u0, u1) * (u2, u3, ...)
       B^5 u2*u3 + B^6 (u2, u3) * (u4, u5, ...)

   which begs for an mpn_addmul_1c accepting an additional carry input
   limb. Is there such a function?

4. There's code to use mpn_addmul_2s. What is that function supposed to
   do, is it doing the above sum?

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.