_basecase or _sec? [

Sat May 4 00:59:54 CEST 2013

[I fixed the grammar in my self-quotations, hopefully not against some
netiqette]

  > We don't need to insist on keeping operands positive.

  Hmm. In general, one needs to replace the largest number, to make
  progres. But I guess in the case of many high bits being equal, it might
  not matter.

To make more than one bit of progress on average, one surely cannot just
relace the largest operand by some every to clever multiple of the
smaller operand.  Right?

OK, so one could replace the larger operand with a linear combination of
the two operands, while leaving the smaller operand unchanged, I
suppose.

  > My thought was not to have a table, but a plain loop which is executed a
  > limb-size dependent number of iterations, and has no internal branches.

  I see. Makes some sense. To get a nice progress guarantee, one such
  "large iteration" would need to apply the matrix, and then do a small
  number of additional reductions.

Applying the matrix yes, but why additional reductions?  Would they
handle some degenerate cases where we would else not make k progress?

I'd say that if some design cannot guarrantee k bits of progress but
just k-1, we should not necessarily cmplicate things for achieving k
bits.

  > I have not looked at right-to-left algorithms for gcdext, at least not
  > in a very long time.  Do they resolve gcd(a,b) = ax + by or something
  > like gcd(a,b)*2^t = ax + by?

  I haven't looked at more sophisticated right-to-left gcd for a long time
  (Stehlé and Zimmermann). But the plain binary algorithm shifts out the
  zero bits as they appear. And the co-factors (to-be inverse) are divided
  by 2 (mod b) in each step, by a shift and a conditional add.

  It's possible, and it might be more efficient, to divide the cofactors
  by 2^k (mod b) at the end, instead of one bit at a time.

I think it is, iff we must maintain full-size operands for SC silence
(as in your present algorithm).

  > Right, answering my question above.  Cannot false factors of 2 be
  > handled?

  As long as the iteration only makes the update a <-- a + k b, no false
  factors can appear. The problem is the k-ary algorithm (Weber's
  accelarated gcd), which uses an update of the form a <-- u a + v b, with
  |u| != 1.

Sure, but a <-- a + k b gives us <= one bit per iteration.  (Less than
one of we insist on principal residues mod b, one bit if we choose a
representative with smallest absolute value, i.e., sometimes negtives.)

  And as for factors of two, I have only considered inversion modulo an
  odd b.

Which is fine, I think.  It will be hard to not leak t for odd * 2^t
where odd > 1.  It will easy not to leak inversion mod 2^t, though...

-- 
Torbjörn