sqrt algorithm

Thu Aug 13 12:52:23 UTC 2015

"Marco Bodrato" <bodrato at mail.dm.unipi.it> writes:

  Ciao,

  On Wed, August 12, 2015 2:03 pm, Torbjörn Granlund wrote:
  >   I tested this approach for sqrlo_basecase too, you can find the code
  >   enclosed by
  >   #ifdef SQRLO_SHORTCUT_MULTIPLICATIONS
  >
  >   But I'm not sure it is faster, so it is currently disabled.
  >
  > It will obviously be faster for machines where widening multiplication
  > is expensive, and in some other hardware cases.

  Ok, I can enable it for sqrlo. Are you working on an improved
  mullo_basecase? Otherwise I can change that code using the same criteria I
  used for sqrlo...

This is my alternative mullo_basecase:

void
mpn_mullo_basecase (mp_ptr rp, mp_srcptr up, mp_srcptr vp, mp_size_t n)
{
  mp_limb_t h;

  h = up[0] * vp[n - 1];

  if (n != 1)
    {
      mp_size_t i;
      mp_limb_t v0;

      v0 = *vp++;
      h += mpn_mul_1 (rp, up, n - 1, v0);
      h += up[n - 1] * v0;
      rp++;

      for (i = n - 2; i > 0; i--)
	{
	  v0 = *vp++;
	  h += mpn_addmul_1 (rp, up, i, v0);
	  h += up[i] * v0;
	  rp++;
	}
    }

  rp[0] = h;
}

  > I don't recall how we defined mullo/sqrlo wrt the size of the target
  > operand.  Can we write above the defined result, i.e., can we write the
  > full 2n product if it is convenient?

  Current implementation of both mullo and sqrlo do write n limbs only,
  possibly by full 2n product in a temporary area followed by MPN_COPY.

  IIRC someone proposed the interface mullo(res, x, y, n, tmp); with 2n
  limbs in tmp, supporting res == tmp, but we never switched to it.

  Regards,
  m

-- 
Torbjörn
Please encrypt, key id 0xC8601622