# _basecase or _sec? [

Torbjorn Granlund tg at gmplib.org
Fri May 3 14:40:49 CEST 2013

nisse at lysator.liu.se (Niels Möller) writes:

I think Newton analogues exist only when b is a power, not in general.
And the most important case is prime b.

I think it exists also for b can be factorised into prime powers...

> I am not familiar with the Jebelean (or Möller!) criteria for gcd, but
> my intuition suggests that there exists a O(k)-bit matrix which reduces
> k bits from the operands.  And the same intuition suggests that it can
> be computed from O(k) high bits.  Am I wrong?

You need to have roughly *2k* bits of each operand to make k bits
progress. And you get into trouble at least in the case that those 2k
bits are equal (you need to subtract one operand from the other, but
which depends on the other bits).

We don't need to insist on keeping operands positive.

Let me think aloud.

As a start, one should try to construct an algorithm which makes (at
least) two bits of progress per iteration, with all problematic special
cases handled in a side-channel silent manner. I looked into it briefly
before writing my current code, but I couldn't find anything easy.

Two bits per bignum operation will give a 100% speedup, and still leave
a lot of headroom for future GMP releases.  :-)

And I'm afraid the table lookup even for just two bits of progress needs
5 bits from each operand, i.e., 1024 entries, which might leak

My thought was not to have a table, but plain loop which executed a
limb-size dependent number of iterations, and has not internal branches.

Right-to-left algorithms will likely have easier bookkeeping than
left-to-right algorithms.

I have not looked at right-to-left algorithms for gcdext, at least not
in a very long time.  Do they resolve gcd(a,b) = ax + by or something
like gcd(a,b)*2^t = ax + by?

Instead of a lehmer-like algorithms which eliminates k bits of each
operand using a vector/matrix multiply, it might be better (smaller
table) to eliminate k bits from the larger operand only. But then we
need to keep track of which operand is largest. I think this simplest
appproach was the one I looked a bit into, and it didn't seem very
promising. I think one can always eliminate two bits, by setting

m <-- min(a,b)
M <-- max(a,b)
a <-- (M + k m) / 4
b <-- m

where k = ± 1, which ever is needed to make M + k m a multiple of 4. And
then some additional logic for the case of even numbers (if M = 0 (mod
4), just set k = 0, and if M = 2 (mod 4), set k = 2). But to go beyond
two bits, one would need something like k-ary gcd, which might introduce
false factors.

Right, answering my question above.  Cannot false factors of 2 be
handled?

The k \in {0, 1, 2, -1} could be a conditional subtraction and addition

1.  Subtract: d <-- a - [a odd] b   (if a odd)
2.  Swap:     b <-- a, d <-- -d     (if underflow above, d < 0)
3.  Add:      a <-- a + 2 b         (under certain conditions)
4.  Shift:    a <-- a / 4	    Always

Current code does (1) and (2) only (and a shift by one bit).

If one wants a single addmul_1, one could choose k \in {0,1,2,3}
instead, but one still *needs* to take sign (a-b) into account; it's no
use to add the larger operand to the smaller. Analysis of the guaranteed
progress after n iterations is going to get more complicated with k > 0.

For any variant, I think the iteration needs to take into account:

Low bits of a and b (one can try to maintain b odd at all times, but
one needs to handle the case of even a). Or high bits for a
left-to-right algorithm, but I think that's going to be less
convenient.

Sign of (a - b).

The straight-forward way to determine the sign of (a - b) is to actually
compute the difference. The result of this subtraction is directly
useful for the "plain" binary algorithm, but a bit wasted if we try
something more sophisticated.

I suspect non-leaky code has to compute the sign as a full subtraction.

--
Torbjörn