Side-channel silent division

Wed Nov 14 17:17:51 CET 2012

nisse at lysator.liu.se (Niels Möller) writes:

  I was been thinking of the following algorithm (I think I wrote about
  the mod variant a while ago).

  Say we want to divide by an n-limb number D, and for simplicity, assume
  that D is normalised.

  First precompute the inverse v = floor(B^{n+1} / D) (one limb and a
  bit), and the corresponding n-limb remainder, r, so that

I assume v is sometimes a limb and two bits, but only for one D.  :-)

Do you mean to compute precisely the above value, not some very close
limb+bit approximation (looking at, say the two most significant limbs
of D)?

    B^{n+1} = v * D + R

So r and R are the same?

  Now, if U is an n+2 limb number, we can reduce it as

    u_{n+1}^B{n+1} + U' = u_{n+1} v D + u_{n+1} R + U' 

Something is strange there.  It seems that

    u_{n+1}*B^{n+1} + U' = u_{n+1} v D + u_{n+1} R + U' 

makes more sense.

  Here, the product u_{n+1} * v is two limbs + a bit to be added to the
  quotient, and U' + u_{n+1} r is n+1 limbs and a carry out.

  We can iterate this to reduce u from n+m limbs to n + 1 (we then need
  something different for the final quotient limb).

U is n+2 limbs (per 6 lines above) and u is a different number of n+m
limbs?

OK, I think I see what you're doing.  You're allowing some greater
partial remainders, and will then (typically) generate a limb+bit
quotient.

This is highly interesting.

  For side-channel silent division, we'd need to store away the q limbs
  and add them up in a single loop after the main loop. The carry from
  addmul_1 would have to be handled inside the loop, not sure how to best do
  that.

That is similar to what I do in my proposed half-limb-at-a-time code.

  For general schoolbook division, it's nice that we use addmul_1 rather
  than submul_1, and when addmul_2 is available it might make sense to
  extend to two quotient limbs at a time.

Absolutely.  We should avoid submul_1 if we can.

  Ah, and one more thing: For the applications which are going to need
  side-channel silent division, I think the common case is a mod operation
  with invariant divisor. In this case, I think the above idea (but
  without all the quotient book-keeping) is a pretty good alternative to
  schoolbook.

I miss your point here.

Are you thinking of replacing REDC with the above trick in, say, powm?

I am mainly aiming at the a * B^n mod m operation for converting to
Montgomery residues, and of CRT.

-- 
Torbjörn