gcdext_1 questions

Wed Dec 2 22:59:09 CET 2009

I'm not sure what interface one should have for mpn_gcdext_1. There are
two possible variants.

The first, lets call it mpn_gcdext_1_s, takes one bignum and one limb as
inputs, and computes the gcd and the first cofactor.

  /* Returns G = gcd(U, V), and a cofactor S such that G = U S (mod
     V). It is required that U >= V > 0, and that the most significant
     limb of U is non-zero. The returned S satisfies the following
     requirements:

       S = 0 iff V divides U

       If G = V/2, then S = 1,
       otherwise, 2 |S| < V / G

     These bounds (see Them. 4.3 in "A Computational Introduction to
     Number Theory and Algebra", Victor Shoup, book
     http://www.shoup.net/ntb/) imply that S always fits in a signed
     limb.
  */
  mp_limb_t
  mpn_gcdext_1_s (mp_limb_signed_t *sp,
                  mp_srcptr up, mp_size_t un, mp_limb_t v);

The second variant, lets call it mpn_gcdext_1_st, takes two limbs as
inputs and computes gcd and both cofactors,

  mp_limb_t
  mpn_gcdext_1_st (mp_limb_signed_t *sp, mp_limb_signed_t *tp,
                   mp_limb_t u, mp_limb_t u)

Both can be implemented using Euclid's algorithm; then the only
difference is that the _s variant has an initial mod_1 call, and it
omits all updates of the second cofactor (which for large u won't fit in
a single limb anyway).

But it's also possible to implement the _st variant on top of the _s
variant, by computing t as (g - s u) / v. Would that be faster or
slower? Can one exploit that this is an exact division? When v is odd,
the computation of v can be done as

  binvert_limb (vinv, v);
  *tp = (g - s * u) * vinv;

(not 100% sure this gets the signs right; if not one would have to test
for the sign of s and do two cases). For a 64-bit machine, that's 8
multiplications, while updating the cofactor as a part of Euclid's
algorithm needs one extra multiplication per iteration.

And in case v is even, one could either compute (g - s u) as a double
word number, and then divide by v using a shift followed by
multiplication by the inverse of the odd part of v. Or one could do some
extra initial processing to take out common powers of two and if needed
swap u,v and sp, tp.

Also for the Euclid code (no matter if it computes one or both
cofactors), one should most likely use the div1 macro os similar to
compute each quotient (which with high probability is small) using shift
and subtraction. And then *maybe* one shouldn't multiply anything by q,
but instead use shifts and adds as each bit of q is generated.

But now we have left interface issues far behind, geting into micro
optimizations...

And then we also have the question, if we can compute canonical
cofactors using the binary algorithm, which would be particularly
attractive if the binary algorithm can be expressed in a branch-free
fashion, like the GCD_1_METHOD == 2 code in gcd_1.c.

Regards,
/Niels
-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.