Small gcdext_1

Thu Oct 31 16:00:46 UTC 2019

nisse at lysator.liu.se (Niels Möller) writes:

  Updated version below, without input_v and corresponding divisions. Also
  one of the binary variants use binv. While debugging, I found that the
  accumulated shift may well exceed GMP_LIMB_BITS (shouldn't be so
  surprising), but I think it must be < 2 * GMP_LIMB_BITS. And there are a
  few corner cases.

  I guess it's possible to do all cofactor shifts as part of the loop, by
  shifting only one bit per iteration, and making the subtraction
  conditional on both nubmers being odd. That would be like a single-limb
  variant of mpn_sec_invert (but with a stop condition to exit early).
  Unclear if that can be a winner, with less progress per iteration in the
  main loop. And it may produce a cofactor that's unnecessarily large in
  the g > 1 case.

  BTW, the binv/redc part uses an umulhi operation. Maybe we should add
  that to longlong.h?

I believe umul_ppmm with unused low result is adequate.  When there are
separate instructions for low and high parts, we have made sure to
implement umul_ppmm in such a way that the compiler will eliminate the
low part.

How far did you come with gcdext_1?  Do you have any timing improvements
to report?  :-)

-- 
Torbjörn
Please encrypt, key id 0xC8601622