hgcd1/2
Niels Möller
nisse at lysator.liu.se
Sat Sep 7 22:17:13 UTC 2019
tg at gmplib.org (Torbjörn Granlund) writes:
> Should we rename div1 to put it into the GMP name space? How about
> mpn_div_11? (We could encode in its name that it is for use for small
> q, but I'd suggest that we don't do that.)
>
> That would allow for some assembly experiments.
If you think assembly will make a big difference (which seems likely),
that makes sense. And define it only when it's faster than a a div
instruction generated by the compiler?
> Taking out powers of two makes things complicated. Taking out a factor
> of two from one of the numbers corresponds to a matrix (1,0;0,2) with
> determinant 2. So the inverse is not an integer matrix, but an integer
> matrix + a shift count.
>
> Would a shift count hurt?
On second thought, I think there's a bigger problem: The way hgcd is
used, it's called on the most significant part of some numbers, but
applied to larger numbers. And a matrix based on trailing zeros in hgcd
won't be applicable to the larger numbers, since they have different
lowend bits.
> It would be interesting to try some left-to-right binary hgcd. Logic
> should be something like
>
> count_leading_zeros(a_bits, a);
> count_leading_zeros(b_bits, b);
>
> if (a_bits == b_bits)
> (a, b) <-- (min(a,b), |a-b|)
> else if a_bits > b_bits
> a <-- |a - b * 2^{a_bits - b_bits}|
> else
> b <-- |b - a * 2^{b_bits - a_bits}|
>
> OK! This is super interesting!
I think it may work nicely on machines with slow multiplication as well
as small-quotient division. But will be some overhead to make
branch-free. I haven't tried it seriously yet.
We'll sometimes get determinant == -1, so users of hgcd2 would need to
be updated to accept that. Or maybe one can just do a swap to ensure
that we have det == +1 in all cases. E.g.,
a <-- 2b - a
corresponds to the matrix (-1, 2; 0,1) with determinant -1. But if we
swap a and b, and set
(a, b) <-- (b, 2b - a)
that's the matrix (0,1; -1, 2) with determinant +1.
In the mean time, I've updated the replacement of div2. See attached
patch, using div1 except when HGCD2_METHOD == 2. Timing on my old laptop:
$ ./tune/speed -c -p1000000 -s1 mpn_hgcd2_1 mpn_hgcd2_2 mpn_hgcd2_3
overhead 6.00 cycles, precision 1000000 units of 8.33e-10 secs, CPU freq 1200.00 MHz
mpn_hgcd2_1 mpn_hgcd2_2 mpn_hgcd2_3
1 1492.62 2064.18 #1375.18
So this is 50% speedup over the old method.
Regards,
/Niels
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hgcd2-div2.patch
Type: text/x-diff
Size: 3089 bytes
Desc: not available
URL: <https://gmplib.org/list-archives/gmp-devel/attachments/20190908/ee1ebce4/attachment.bin>
-------------- next part --------------
--
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel
mailing list