# Fast constant-time gcd computation and modular inversion

Torbjörn Granlund tg at gmplib.org
Wed Aug 31 10:26:35 CEST 2022

```  >     count = (49 * bits + 57) / 17;
>
> Odd.

For sure. This isn't based on local progress of the algorithm (there
ain't no guaranteed progress for a short sequence of reduction steps),
but based on rather complex analysis of the number of steps needed for

That count is a proven "upper bound" of the numbver of iterations.  Does
the paper discuss how tight it is, i.e., is it close to a "lower bound"
of the required number of iterations.

> I think your measurements are a bit optimistic, though.  My measruments
> from slightly modified timing code suggest 4.5x slowdown, and more for
> really small operands.

Maybe, I tried to keep the timing really simple and portable.

I simply up'ed the number of iterations (actually made them depend in
the operand size).

I've now implemented inverse too. See updated code below. When I run it,
I get

invert size 1, ref 0.293 (us), djb 1.145 (us)
invert size 2, ref 0.721 (us), djb 1.659 (us)
invert size 3, ref 0.994 (us), djb 2.375 (us)
[...]
invert size 17, ref 5.157 (us), djb 18.538 (us)
invert size 18, ref 5.207 (us), djb 19.064 (us)
invert size 19, ref 5.702 (us), djb 21.286 (us)

so a slowdown of 3--4 compared to mpz_invert. This timing excludes the
precomputation of a few needed per-modulo constants.

Very very nice!

People not famililiar with what we're trying to do here might be
unimpressed with these results.  "Look, I've written new code which is 4
times slower than what we have, HURRAY!"

Comparing to the existing side-channel-silent inversion code would
impress more, I think.

I think the inverse flavor is still rather neat, main loop is

for (delta = 1;count > STEPS;count -= STEPS)
{
delta = steps (&M, STEPS, delta, fp[0], gp[0]);
matrix_vector_mul (&M, STEPS, n+1, fp, gp, tp);
matrix_vector_mul_mod (&M, Bp, n+2, up, vp, tp);
}

I can make it even neater:

for (delta = 1;count > STEPS;count -= STEPS)
{
do_it_all (fp, gp, up, vp, n);
}

:-)

I'm thinking I'll try to implement and benchmark this for Nettle's ecc
functions first, before attempting to update GMP function. The reason is
that (i) I really want to do needed precomputations for all moduli of
interest for Nettle at compile time, which would complicate the GMP
interface if it is supported at all, and (ii) in some cases I want
inversion to operate on numbers in montgomery representation, and then
I'd like to fold any needed additional power of two into the precomputed
constant.

What percentage of Ec scalar mul is used for the final inversion?  Not
much, I suppose.  Does Nettle use exponentiation there today, or
mpn_sec_invert?

--
Torbjörn