Fast constant-time gcd computation and modular inversion

>   Extract least significant 96 bits of each number.
> Is that 3 32-bit limbs or 1.5 64-bit limbs?

I was thinking about 64-bit archs.

Then 96 bits seems to be the maximum number of low-end bits that can be
eliminated, under the constraint that elements of the corresponding
transformation matrix should fit in 64-bit (signed) limbs.

And there's no harm in extracting two full limsb (128 bits), it's just
that the transformation matrix doesn't depend in any way on those extra

For 32-bit transform matrix, one could do less (roughly half, not sure
of it's precicely 48 bits).


