Side-channel silent division
Torbjorn Granlund
tg at gmplib.org
Wed Nov 14 17:17:51 CET 2012
nisse at lysator.liu.se (Niels Möller) writes:
I was been thinking of the following algorithm (I think I wrote about
the mod variant a while ago).
Say we want to divide by an n-limb number D, and for simplicity, assume
that D is normalised.
First precompute the inverse v = floor(B^{n+1} / D) (one limb and a
bit), and the corresponding n-limb remainder, r, so that
I assume v is sometimes a limb and two bits, but only for one D. :-)
Do you mean to compute precisely the above value, not some very close
limb+bit approximation (looking at, say the two most significant limbs
of D)?
B^{n+1} = v * D + R
So r and R are the same?
Now, if U is an n+2 limb number, we can reduce it as
u_{n+1}^B{n+1} + U' = u_{n+1} v D + u_{n+1} R + U'
Something is strange there. It seems that
u_{n+1}*B^{n+1} + U' = u_{n+1} v D + u_{n+1} R + U'
makes more sense.
Here, the product u_{n+1} * v is two limbs + a bit to be added to the
quotient, and U' + u_{n+1} r is n+1 limbs and a carry out.
We can iterate this to reduce u from n+m limbs to n + 1 (we then need
something different for the final quotient limb).
U is n+2 limbs (per 6 lines above) and u is a different number of n+m
limbs?
OK, I think I see what you're doing. You're allowing some greater
partial remainders, and will then (typically) generate a limb+bit
quotient.
This is highly interesting.
For side-channel silent division, we'd need to store away the q limbs
and add them up in a single loop after the main loop. The carry from
addmul_1 would have to be handled inside the loop, not sure how to best do
that.
That is similar to what I do in my proposed half-limb-at-a-time code.
For general schoolbook division, it's nice that we use addmul_1 rather
than submul_1, and when addmul_2 is available it might make sense to
extend to two quotient limbs at a time.
Absolutely. We should avoid submul_1 if we can.
Ah, and one more thing: For the applications which are going to need
side-channel silent division, I think the common case is a mod operation
with invariant divisor. In this case, I think the above idea (but
without all the quotient book-keeping) is a pretty good alternative to
schoolbook.
I miss your point here.
Are you thinking of replacing REDC with the above trick in, say, powm?
I am mainly aiming at the a * B^n mod m operation for converting to
Montgomery residues, and of CRT.
--
Torbjörn
More information about the gmp-devel
mailing list