_basecase or _sec? [

Fri May 3 15:07:33 CEST 2013

Torbjorn Granlund <tg at gmplib.org> writes:

> I think it exists also for b can be factorised into prime powers...

Sure, any factorization can be used with CRT, but it's only powers which
allow newton-like hensel-lifts, as far as I'm aware.

And the most important case is when b is prime.

>   You need to have roughly *2k* bits of each operand to make k bits
>   progress. And you get into trouble at least in the case that those 2k
>   bits are equal (you need to subtract one operand from the other, but
>   which depends on the other bits).
>   
> We don't need to insist on keeping operands positive.

Hmm. In general, one needs to replace the largest number, to make
progres. But I guess in the case of many high bits being equal, it might
not matter.

> My thought was not to have a table, but plain loop which executed a
> limb-size dependent number of iterations, and has not internal branches.

I see. Makes some sense. To get a nice progress guarantee, one such
"large iteration" would need to apply the matrix, and then do a small
number of additional reductions.

> I have not looked at right-to-left algorithms for gcdext, at least not
> in a very long time.  Do they resolve gcd(a,b) = ax + by or something
> like gcd(a,b)*2^t = ax + by?

I haven't looked at more sophisticated right-to-left gcd for a long time
(Stehlé and Zimmermann). But the plain binary algorithm shifts out the
zero bits as they appear. And the co-factors (to-be inverse) are divided
by 2 (mod b) in each step, by a shift and a conditional add.

It's possible, and it might be more efficient, to divide the cofactors
by 2^k (mod b) at the end, instead of one bit at a time.

> Right, answering my question above.  Cannot false factors of 2 be
> handled?

As long as the iteration only makes the update a <-- a + k b, no false
factors can appear. The problem is the k-ary algorithm (Weber's
accelarated gcd), which uses an update of the form a <-- u a + v b, with
|u| != 1.

And as for factors of two, I have only considered inversion modulo an
odd b.

> I suspect non-leaky code has to compute the sign as a full subtraction.

It should be possible to do a non-leaky loop examining all limbs,
without carry propagation. Just doing comparisions a[k] < b[k] and a[k]
== b[k], combining results with logic operations. No idea if that can be
faster than a plain subtraction.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.