_basecase or _sec? [

Fri May 3 14:40:49 CEST 2013

nisse at lysator.liu.se (Niels Möller) writes:

  I think Newton analogues exist only when b is a power, not in general.
  And the most important case is prime b.

I think it exists also for b can be factorised into prime powers...

  > I am not familiar with the Jebelean (or Möller!) criteria for gcd, but
  > my intuition suggests that there exists a O(k)-bit matrix which reduces
  > k bits from the operands.  And the same intuition suggests that it can
  > be computed from O(k) high bits.  Am I wrong?

  You need to have roughly *2k* bits of each operand to make k bits
  progress. And you get into trouble at least in the case that those 2k
  bits are equal (you need to subtract one operand from the other, but
  which depends on the other bits).

We don't need to insist on keeping operands positive.

  Let me think aloud.

  As a start, one should try to construct an algorithm which makes (at
  least) two bits of progress per iteration, with all problematic special
  cases handled in a side-channel silent manner. I looked into it briefly
  before writing my current code, but I couldn't find anything easy.

Two bits per bignum operation will give a 100% speedup, and still leave
a lot of headroom for future GMP releases.  :-)

  And I'm afraid the table lookup even for just two bits of progress needs
  5 bits from each operand, i.e., 1024 entries, which might leak
  information through the cache instead.

My thought was not to have a table, but plain loop which executed a
limb-size dependent number of iterations, and has not internal branches.

  Right-to-left algorithms will likely have easier bookkeeping than
  left-to-right algorithms.

I have not looked at right-to-left algorithms for gcdext, at least not
in a very long time.  Do they resolve gcd(a,b) = ax + by or something
like gcd(a,b)*2^t = ax + by?

  Instead of a lehmer-like algorithms which eliminates k bits of each
  operand using a vector/matrix multiply, it might be better (smaller
  table) to eliminate k bits from the larger operand only. But then we
  need to keep track of which operand is largest. I think this simplest
  appproach was the one I looked a bit into, and it didn't seem very
  promising. I think one can always eliminate two bits, by setting

    m <-- min(a,b)
    M <-- max(a,b)
    a <-- (M + k m) / 4
    b <-- m

  where k = ± 1, which ever is needed to make M + k m a multiple of 4. And
  then some additional logic for the case of even numbers (if M = 0 (mod
  4), just set k = 0, and if M = 2 (mod 4), set k = 2). But to go beyond
  two bits, one would need something like k-ary gcd, which might introduce
  false factors.

Right, answering my question above.  Cannot false factors of 2 be
handled?

  The k \in {0, 1, 2, -1} could be a conditional subtraction and addition

  1.  Subtract: d <-- a - [a odd] b   (if a odd)
  2.  Swap:     b <-- a, d <-- -d     (if underflow above, d < 0) 
  3.  Add:      a <-- a + 2 b         (under certain conditions)
  4.  Shift:    a <-- a / 4	    Always

  Current code does (1) and (2) only (and a shift by one bit).

  If one wants a single addmul_1, one could choose k \in {0,1,2,3}
  instead, but one still *needs* to take sign (a-b) into account; it's no
  use to add the larger operand to the smaller. Analysis of the guaranteed
  progress after n iterations is going to get more complicated with k > 0.

  For any variant, I think the iteration needs to take into account:

    Low bits of a and b (one can try to maintain b odd at all times, but
    one needs to handle the case of even a). Or high bits for a
    left-to-right algorithm, but I think that's going to be less
    convenient.

    Sign of (a - b).

  The straight-forward way to determine the sign of (a - b) is to actually
  compute the difference. The result of this subtraction is directly
  useful for the "plain" binary algorithm, but a bit wasted if we try
  something more sophisticated.

I suspect non-leaky code has to compute the sign as a full subtraction.

-- 
Torbjörn