Fast constant-time gcd computation and modular inversion

Niels Möller nisse at
Tue May 24 11:59:12 CEST 2022

Hi, I've had a first look at the paper by djb and Bo-Yin Yang, Mainly focusing on the integer

Abstract says its a variant of Euclid's algorithm, but as I read it,
it's really a right-to-left algorithm closer to the Stehlé-Zimmerman
binary recursive gcd. But in the polynomial case, one can use symmetry
under coefficient reversal to make it work from the other end.

Let's first review the pretty slow mpn_sec_invert currently in GMP, and
for simplicy, look only at the gcd part, ignoring cofactor updates.
(This algorithm was inspired by the mpn_gcd_11 function, which strives
to eliminate the hard-to predict branches from the binary algorithm).

This algorithm maintains b odd, and an iteration step with the
following cases:

  if a even:
    a <-- a/2

  else if (a >= b)
    a <-- (a - b) / 2

  else // a < b
    [a, b] <-- [(b - a)/2, a]

This guarantess one bit of progress as long as a > 0, becase bitsize(a)
+ bitsize(b) is reduced by at least one. Of a, b are at most \ell bits,
we are therefore guaranteed a = 0 after 2\ell iterations.

Like the plain binary algorithm, this has the problem that it depends on
bits at both ends, with no obvious analogue of Lehmer's algorithm or
other divide and conquer.

The new algorithm in the above paper uses a small auxillary state
variable d (or \delta in the paper) to decide the update in the case
both a, b are odd. If I write the iteration step using the same notation
as above, it's

  if a even:
    a <-- a / 2, d <-- 1 + d

  else if d <= 0   
    a <-- (a + b) / 2, d <-- 1 + d

  else // d > 0
    [a, b] <-- [(b - a)/2, a], d <-- 1 - d 

So conditions depend only on the sign of the auxillary d, and the least
significant bit of a. It follows that one can do a sequence of t steps
based on only the least significant t bits of the inputs, and collect
the result into a transformation matrix. This opens up for both a
Lehmer-style algorithm and a Schönhage-like algorithm, but a bit simpler
because there's no need for a separate division step to handle a large

It is claimed that 2-t < d <= t (it's initially 1), and we can see in
the update rules for odd a they then strive to reduce |d| (except if d =
0 already).

In which way the decision logic based on sign of d guarantees progress
is not clear to me (I haven't tried to read all the proofs), but I guess
d should be understood as an approximation of bitsize(b) - bitsize(a)

As I read the claimed bounds, if initial a and b are \ell bits, then
there's a bound m, close to 2.88 \ell, guaranteeing that after m
iterations, we have reached a = 0.


Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677.
Internet email is subject to wholesale government surveillance.

More information about the gmp-devel mailing list