hgcd1/2
Torbjörn Granlund
tg at gmplib.org
Tue Sep 10 12:01:11 UTC 2019
nisse at lysator.liu.se (Niels Möller) writes:
> I believe METHOD 1 could almost always be pointless if METHOD 3's
> straight line code is made really efficient.
I'd guess so too. But benchmarks are better than faith alone ;-)
I have a lot of faith in benchmarking! :-)
> That should in particular be true if the q = 1 code is removed from
> hgcd2, for machines with good multiplication throughput.
Can we remove that for everything but HGCD2_METHOD == 2? It would be
nice to not have an explosion of cases.
Do you really mean HGCD2_METHOD 2? I'd say that only _METHOD 1 is poor
at q = 1, while _METHOD 2 and 3 handle it well.
Or perhaps you reason along the lines of "if a CPU is good at small
quotient division, it ought to be good at multiplication, therefore
_METHOD 1 does not need q = 1 code which principally avoids some limb
multiplications"?
Perhaps there is some confusion about which method is chosen for each
HGCD2_METHOD value? You designated the original method number 2, your
plain / based one number 1. I added lead texts to
https://gmplib.org/devel/thres/HGCD2_METHOD.html for clarity.
We call the normalizing div2 only when q > bh, which implies bh^2 < ah.
Then bh can't be normalized. When debugging (on 64-bit), I only saw this
code entered with size of bh exactly 32 bits, ah exactly 64 bits. But I
think it might be called with bh even smaller.
A comment stating that cnt = 0 is not possible (or an ASSERT!).
> Should be revive the commented-out div2? It looks good to me, the
> comment about that it is slower than the branchy div2 is probably from
> the era where branchy div1/div2 was great.
I haven't had the time to benchmark it, but I wouldn't be surprised if
it runs better than the other div2.
I suggest that we rename HGCD2_METHOD to DIV_11_METHOD. Agree?
Than we might want DIV_22_METHOD too.
The thing is, we don't want q = 2 and one number reduced to a zero. We
want to terminate with a matrix corresponding to both a and b roughly
half a limb (full limb, if it weren't for the truncation when going from
double precision to single precision), and |a - b| smaller.
[snip]
Pretty subtle!
--
Torbjörn
Please encrypt, key id 0xC8601622
More information about the gmp-devel
mailing list