Side-channel silent division

Sun Nov 11 23:24:49 CET 2012

I intend to add a few more side-channel functions, from the powm_sec we
have already.  The goal is to allow side-channel silent arithmetic
crypto use of GMP, including CRT, plain inverses and modular inverses.

I have now hacked a reasonably fast schoolbook division function.  It is
roughly 2x slower than plain schoolbook division, since the innerloop
uses two submul_1 calls per quotient limb.

In the way this is written, it should be very resillient to how the
inverse in dinv is computed.  I have not hacked side-channel silent
invers code yet.

Any opinions on this approach?  I realise that it would be possible to
quitly develop more than GMP_LIMB_BITS/2 quotient bit per submul_1, such
as perhaps GMP_LIMB_BITS-1.  I haven't tried that.

mp_limb_t
mpn_sbpi1_div_qr_sec (mp_ptr qp,
		      mp_ptr np, mp_size_t nn,
		      mp_srcptr dp, mp_size_t dn,
		      mp_limb_t dinv)
{
  mp_limb_t qh, nh, cy, q1h, q0h, dummy, h;
  mp_size_t i;
  mp_ptr np_org, hp, qlp, qhp;
  TMP_DECL;

  ASSERT (dn > 2);
  ASSERT (nn > dn);
  ASSERT ((dp[dn-1] & GMP_NUMB_HIGHBIT) != 0);

  TMP_MARK;

  /* Create a divisor copy shifted half a limb.  */
  hp = TMP_ALLOC_LIMBS (dn + 1);
  cy = mpn_lshift (hp, dp, dn, GMP_LIMB_BITS / 2);
  hp[dn] = dp[dn - 1] >> GMP_LIMB_BITS / 2;

  qlp = TMP_ALLOC_LIMBS (nn - dn + 1);
  qhp = TMP_ALLOC_LIMBS (nn - dn + 1);

  np_org = np;

  np += nn;

  qh = mpn_sub_n (np - dn, np - dn, dp, dn) >= 0;
  mpn_addcnd_n (np - dn, np - dn, dp, dn, qh);

  np--;

  nh = 0;

  /* Main loop.  Develop one full limb per iteration, but do it in two steps in
     order to avoid conditionals.  Quotient bits will be either correct or
     underestimates.  When a quotient is underestimated, the next quotient will
     compensate, since quotients are to be added at consecutive weight distance
     GMP_LIMB_BITS/2.  We make two quotient arrays, about GMP_LIMB_BITS/2
     each.  The arrays are added late after the loop.  Separate arrays avoids
     data-dependent carry propagation.  */
  for (i = nn - dn - 1; i >= 0; i--)
    {
      nh = (nh << GMP_LIMB_BITS/2) + (np[0] >> GMP_LIMB_BITS/2);
      umul_ppmm (q1h, dummy, nh, dinv);
      q1h += nh;
      cy = mpn_submul_1 (np - dn, hp, dn + 1, q1h);

      nh = np[0];
      umul_ppmm (q0h, dummy, nh, dinv);
      q0h += nh;
      cy = mpn_submul_1 (np - dn, dp, dn, q0h);

      nh -= cy;
      np[0] = nh;

      np--;
      qlp[i] = q0h;
      qhp[i] = q1h;
    }

  /* 1st adjustment depends on extra high remainder limb.  */
  h = np_org[dn];		ASSERT_ALWAYS (h <= 1);
  qlp[0] += h;
  h -= mpn_subcnd_n (np_org, np_org, dp, dn, h);

  /* 2ns adjustment depends on remainder/divisor comparision as well as whether
     extra remainder was nullified by previous subtract.  */
  cy = mpn_sub_n (np_org, np_org, dp, dn);
  cy = cy == h;
  mpn_addcnd_n (np_org, np_org, dp, dn, 1 - cy);
  qlp[0] += cy;

  /* Combine quotient halves into final quotient.  */
  mpn_lshift (qhp, qhp, nn - dn, GMP_LIMB_BITS/2);
  mpn_add_n (qp, qhp, qlp, nn - dn);

  TMP_FREE;
  return qh;
}

-- 
Torbjörn