Fast constant-time gcd computation and modular inversion

Tue Aug 23 20:44:05 CEST 2022

nisse at lysator.liu.se (Niels Möller) writes:

> Then 96 bits seems to be the maximum number of low-end bits that can be
> eliminated, under the constraint that elements of the corresponding
> transformation matrix should fit in 64-bit (signed) limbs.

I had another look into this (that is, the paper
https://eprint.iacr.org/2019/266.pdf) today, thinking that some careful
analysis could prove a bound like this. But I now think it's not
possible, and taking the worst case into account means we can fill a
64-bit matrix with significantly less than 64 bits eliminated.

To review, we examine the least significant k bits of a,b (a odd),
construct a matrix (with integer elements, determinant ±2^k) that
eliminates those least significant low bits. So we can set

  (a'; b') = 2^{-k} M (a; b)

We then hope that elements of M are smaller than k bits, so we actually
make progress. The paper proves progress, by splitting into groups
representing 2-dic divisions and deriving bounds for the norm in a
rather complicated way.

But we don't have any such bound locally. E.g, consider the simple case
that it turns out that the low k bits of b are all zeros (I'm not yet
sure this is the worst case, but I hope it can't get much worse).

Then the best we can do is a' = a, b' = b/2^k, i.e.,

  (a'; b') = (a, b/2^k) = 2^{-k} (2^k, 0 ; 0, 1) (a; b)

If we require the largest matrix element, 2^k, to fit in a 64-bit
signed, we can have at most k = 62.

The paper proves that we make sufficient progress on average, as the
reductions continue, but seems that doesn't translate to progress for
shorter segments of reductions.

And maybe not so surprising, if we compare to the current Lehmer
algorithm: Split of top two limbs, compute a matrix with single limb
elements. Apply matrix, making roughly one limb of progress. Except when
we encounter a really large quotient; then it can't be represented as a
matrix with single-limb elements, and we instead need a large division,
i.e., a larger step with larger progress.

The new algorithm can't magically make that large quotient case vanish,
the trick is that it can handle it incrementally, without special cases
to cause side-channel leakage. But then those incremental steps will not
make any clear progress on their own, only when combined to correspond
to a large quotient.

Regards,
/Niels

-- 
Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677.
Internet email is subject to wholesale government surveillance.