gcd_11
Niels Möller
nisse at lysator.liu.se
Sat Aug 17 10:25:05 UTC 2019
tg at gmplib.org (Torbjörn Granlund) writes:
> static mp_limb_t
> gcd_bin_tab16 (mp_limb_t a, mp_limb_t b, size_t *np)
> {
> static signed char tab[16] = {
> #if ! defined (TAB_VARIANT) || TAB_VARIANT == 4 || TAB_VARIANT > 4
> -1, -3, -5, -7,
> -3, -1, -7, -5,
> -5, -7, -1, -3,
> -7, -5, -3, -1,
> #endif
> };
> return gcd_bin_tabkm (a, b, np, tab, 2);
> }
And one has some freedom to choose representativs of these numbers mod
2^{k+1}. E.g., one might replace -7 by +1 and/or -5 by +3.
> So is this practical? We haven't made a good implementation yet. Such
> an implementation would need to eliminate all data-dependent branches.
It could be arrange as first doing
t = u - v;
v <-- min (u,v)
u <-- |t|
with the same masking tricks as in the current gcd_11. Then, to
eliminate some additional bits of u, do a table lookup (based on u, v
*after* the above operations) for an even value m, and set
u <-- u + m v
Multiplication with additiona and masks is possible, but I'm afraid the
needed shifts to get v added in at the right position may be slow.
if u == 0, terminate, otherwise set c = count_trailing_zeros(u).
If we use any negative candidates for m, do conditional negation to get
u <-- |u|
and finally shift u by c bits.
Compared to gcd_11, we get a few more bits of progress. The
additonal operations are the table lookup (could be shifts of a magic
constant), the multiplication, and the absolute value.
-----
On a somewhat different track, I've reread the paper on k-ary gcd, and
lets focus on 16-ary, eliminating a least 4 bits at a time.
Given u, v odd, u > v, one can compute the quotient q = u/v mod 16; this
is the same multiplier that you tabulate. Let's use representatitives
±1, ±3, ± 5, ±7.
One can then set u <-- (u - q v) / 16 (which may be odd or even). But
that won't always give 4 bits of progress; if q v > u we get some growth
at the high end, and this happens when q is largish and u and v are
close in size. When q is small, ±1 or ±3, progress should aways be good
is good.
The k-ary trick comes in when q is largish. When q = 5, instead of
u <-- (u - 5v) / 16
we could try
u <-- (3u + v) / 16
Unfortunately, that may introduce a spurious factor of 3, since gcd (3u
+ v, v) = gcd (3u, v), and probably not worth it, so lets stick to u -
5v in this case, and consider larger q.
If q = 7, then instead of u <-- u - 7v, either of the linear combinations
u <-- (3u - 5v) / 16
u <-- (5u - 3v) / 16
gives good progress for u, but with risk for spurious factors. However,
if we use *both* combinations,
u <-- (5u - 3v) / 16
v <-- (3u - 5v) / 16
then we can note that the determinant is -9 + 25 = -16, is a power of
two! So we get avoid the spurious factor problem, at the cost of some
growth of min (u,v) when q = ±7; the -7 case is a bit worse, with
u <-- (5u + 3v) / 16
v <-- (3u + 5v) / 16
Here v is still the smaller number, and worst case growth is from close
to zero to close to 3u/16. Maybe we get good enough progress on average
so that it's worth it?
We could also adapt, and use the u <-- u - 7 v when v is small compared
to u, and the k-ary linear combinations when u and v are close in size.
That should give pretty good progress in every iteration, but perhaps
it gets to complicated to have a chance of running fast.
Regards,
/Niels
--
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel
mailing list