_basecase or _sec? [
Niels Möller
nisse at lysator.liu.se
Fri May 3 15:07:33 CEST 2013
Torbjorn Granlund <tg at gmplib.org> writes:
> I think it exists also for b can be factorised into prime powers...
Sure, any factorization can be used with CRT, but it's only powers which
allow newton-like hensel-lifts, as far as I'm aware.
And the most important case is when b is prime.
> You need to have roughly *2k* bits of each operand to make k bits
> progress. And you get into trouble at least in the case that those 2k
> bits are equal (you need to subtract one operand from the other, but
> which depends on the other bits).
>
> We don't need to insist on keeping operands positive.
Hmm. In general, one needs to replace the largest number, to make
progres. But I guess in the case of many high bits being equal, it might
not matter.
> My thought was not to have a table, but plain loop which executed a
> limb-size dependent number of iterations, and has not internal branches.
I see. Makes some sense. To get a nice progress guarantee, one such
"large iteration" would need to apply the matrix, and then do a small
number of additional reductions.
> I have not looked at right-to-left algorithms for gcdext, at least not
> in a very long time. Do they resolve gcd(a,b) = ax + by or something
> like gcd(a,b)*2^t = ax + by?
I haven't looked at more sophisticated right-to-left gcd for a long time
(Stehlé and Zimmermann). But the plain binary algorithm shifts out the
zero bits as they appear. And the co-factors (to-be inverse) are divided
by 2 (mod b) in each step, by a shift and a conditional add.
It's possible, and it might be more efficient, to divide the cofactors
by 2^k (mod b) at the end, instead of one bit at a time.
> Right, answering my question above. Cannot false factors of 2 be
> handled?
As long as the iteration only makes the update a <-- a + k b, no false
factors can appear. The problem is the k-ary algorithm (Weber's
accelarated gcd), which uses an update of the form a <-- u a + v b, with
|u| != 1.
And as for factors of two, I have only considered inversion modulo an
odd b.
> I suspect non-leaky code has to compute the sign as a full subtraction.
It should be possible to do a non-leaky loop examining all limbs,
without carry propagation. Just doing comparisions a[k] < b[k] and a[k]
== b[k], combining results with logic operations. No idea if that can be
faster than a plain subtraction.
Regards,
/Niels
--
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel
mailing list