hgcd1/2

Sat Sep 7 15:34:20 UTC 2019

nisse at lysator.liu.se (Niels Möller) writes:

  Keep in mind that the optimizations to div1 so far only help the second,
  cheaper, half of hgcd2. The first half, working with double precision,
  still uses the div2 function, which is full of branches.

I know, but the number I mainly look at is the one produced by tuneup
which compared the HGCD2_METHOD.  HGCD2_METHOD 3 should kick the other
methods, but it doesn't.

Should we rename div1 to put it into the GMP name space?  How about
mpn_div_11?  (We could encode in its name that it is for use for small
q, but I'd suggest that we don't do that.)

That would allow for some assembly experiments.

  Taking out powers of two makes things complicated. Taking out a factor
  of two from one of the numbers corresponds to a matrix (1,0;0,2) with
  determinant 2. So the inverse is not an integer matrix, but an integer
  matrix + a shift count.

Would a shift count hurt?

  It would be interesting to try some left-to-right binary hgcd. Logic
  should be something like

    count_leading_zeros(a_bits, a);
    count_leading_zeros(b_bits, b);

    if (a_bits == b_bits)
      (a, b) <-- (min(a,b), |a-b|)
    else if a_bits > b_bits
      a <-- |a - b * 2^{a_bits - b_bits}|
    else 
      b <-- |b - a * 2^{b_bits - a_bits}|

OK!  This is super interesting!

If you lay a working foundation, I'd be happy to jump in an optimise
things!

-- 
Torbjörn
Please encrypt, key id 0xC8601622