hgcd1/2

Tue Sep 10 12:01:11 UTC 2019

nisse at lysator.liu.se (Niels Möller) writes:

  > I believe METHOD 1 could almost always be pointless if METHOD 3's
  > straight line code is made really efficient.

  I'd guess so too. But benchmarks are better than faith alone ;-)

I have a lot of faith in benchmarking!  :-)

  > That should in particular be true if the q = 1 code is removed from
  > hgcd2, for machines with good multiplication throughput.

  Can we remove that for everything but HGCD2_METHOD == 2? It would be
  nice to not have an explosion of cases.

Do you really mean HGCD2_METHOD 2?  I'd say that only _METHOD 1 is poor
at q = 1, while _METHOD 2 and 3 handle it well.

Or perhaps you reason along the lines of "if a CPU is good at small
quotient division, it ought to be good at multiplication, therefore
_METHOD 1 does not need q = 1 code which principally avoids some limb
multiplications"?

Perhaps there is some confusion about which method is chosen for each
HGCD2_METHOD value?  You designated the original method number 2, your
plain / based one number 1.  I added lead texts to
https://gmplib.org/devel/thres/HGCD2_METHOD.html for clarity.

  We call the normalizing div2 only when q > bh, which implies bh^2 < ah.
  Then bh can't be normalized. When debugging (on 64-bit), I only saw this
  code entered with size of bh exactly 32 bits, ah exactly 64 bits. But I
  think it might be called with bh even smaller.

A comment stating that cnt = 0 is not possible (or an ASSERT!).

  > Should be revive the commented-out div2?  It looks good to me, the
  > comment about that it is slower than the branchy div2 is probably from
  > the era where branchy div1/div2 was great.

  I haven't had the time to benchmark it, but I wouldn't be surprised if
  it runs better than the other div2.

I suggest that we rename HGCD2_METHOD to DIV_11_METHOD.  Agree?
Than we might want DIV_22_METHOD too.  

  The thing is, we don't want q = 2 and one number reduced to a zero. We
  want to terminate with a matrix corresponding to both a and b roughly
  half a limb (full limb, if it weren't for the truncation when going from
  double precision to single precision), and |a - b| smaller.

[snip]

Pretty subtle!

-- 
Torbjörn
Please encrypt, key id 0xC8601622