General mpn_gcd_basecase
Niels Möller
nisse at lysator.liu.se
Fri Aug 30 10:33:07 UTC 2019
tg at gmplib.org (Torbjörn Granlund) writes:
> Ah, so you're thinking of not just a+b*tab_b[...] but
> a*tab_a[...]_b*tab_b[...]? I understand the latter much less, and don't
> know how to deal with the introduced spurious factors which seem
> inevitable.
If you do
a <-- s a + t b, b unchanged
then you can get spurious factors from s. And in the case of the binary
algorithm, that's fine if s is a power of 2. (I think you mentioned that
earlier: Try to find k so that 2^k mod b is a few bits smaller than b,
and set a <-- 2^k a - q b, to increase euclid progress).
If we update both a and b with a matrix multiply, (a;b) <-- M (a;b), we
can get spurious factors from the determinant of M.
In an earlier mail I sketched a variant that update both a and b only in
a few of the cases, and then only using matrices where the determinant
happens to be a power of 2.
> I see two complications:
>
> 1. We need an efficient way to find out which of |u| and |v| is
> smallest, for CMP_SWAP. But we can likely get away with something a
> bit sloppy, and accept arbitrary results in the case that u and v are
> "close", e.g., same bit size.
>
> We do that now. We just compare the top limbs.
Only assuming u and v unsigned, right? For two's complement, we need to
take absolute values into account in the comparison in some way.
> 2. For the variant without branches, we'd need an mpn_addmul_1s working
> with two's complement. It could be implemented as mpn_addmul_1 +
> mpn_cnd_sub, but not sure if there's some easy way to implement with
> a single pass, somehow sign extending on the fly.
>
> This might be easy or it might be hard. We have signed 64b x 64b ->
> 128b on some machines (including x86) which might be useful.
I doubt sign multiply would be that useful. An instruction to do signed
64b x unsigned 64b --> signed 128b, would be nice, but I don't know any
arch having that.
> > cy = mpn_addmul_1 (up, vp, n, m);
> > a = up[0];
> > count_trailing_zeros (cnt, a);
>
> One could move the count_trailing_zeros before mpn_addmul_1 by
> duplicating the mul, a = up[0] + m * vp[0]. Would make more sense if we
> also have an addmul + shift loop.
>
> I don't follow your reasoning here.
Just suggesting that we'd may get slightly less latency with
a = up[0] - m*vp[0];
count_trailing_zeros (cnt, a);
cy = mpn_addmul_1 (up, vp, n, m);
Maybe not so useful on it's own, but with assembly, there are further
opportunities.
> This addmul_1-only, positive-only table is a strange gcd animal. It
> will never yield U = 0. Individual limbs are not unlikely to become
> zero.
Ooops. Then one definitely need a different stop condition than comparing
something to zero.
> My latest loop loks like this:
>
> while (n > 2)
> {
> mp_limb_t a, b, m, cy;
> int cnt, ucnt, vcnt;
>
> CMP_SWAP (up, vp, n);
>
> a = up[0];
> b = vp[0];
> m = tabp[((a & mask) << (4 - 1)) + ((b & mask) >> 1)];
>
> count_leading_zeros (ucnt, up[n - 1]);
> count_leading_zeros (vcnt, vp[n - 1]);
If your willing to have both count_leading_zeros and
count_trailing_zeros in the loop, maybe it's simpler and more efficient
with a table-based binary euclid?
CND_SWAP
count_leading_zeros(cnt, up[n-1] | vp[n-1])
q = tab[...high bits...]
if (q odd)
u <-- u - q v
else
u <-- (q+1) u - v
count_trailing_zeros(cnt, up[0]); /* At least one */
u >>= cnt;
(plus handling of less likely special cases).
It seems like an mpn_rsbmul_1 would be a useful primitive for several
variants.
> 2. The idea with m = (CNST_LIMB(1) << cnt - 1) - m is to chop of a bit
> also in the top end of U, not just get >= 5 bits of zeros in the low
> end. It is a generalisation of the tables I played with before for
> gcd_11; any entry can be offset by a multiple of 2^5 = 32 without
> making less than 5 low bit become 0. I actually believe this is a
> great idea, but I haven't checked if it is as effective as I think it
> is...
Sounds like the opposite of binary euclid.
Regards,
/Niels
--
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel
mailing list