Niels Möller nisse at
Thu May 6 23:26:19 CEST 2010

Torbjorn Granlund <tg at> writes:

> your mod_2, gcc-?.?                32 (assuming first branch is taken)

/usr/bin/gcc on shell, which seems to be 4.2.1.

> I started something at  The frac function are
> missing.  Should we for the benefit of old and embedded processors
> provide div_*_1 loops that actually use a 2/1 division instruction?

I think we should definitely have top-level functions div_*_1 and
div_*_2, which on most platforms would just compute a suitable inverse
(one or two limbs) and call the right _1u or _1n (or _2u or _2n)
function. There's going to be code in several places (at least mpz_gcd,
mpz_gcdext and mpz_jacobi) that will do something like

  if (an == 1)
      b = mpn_div_r_1 (bp, bn, ap[0]);
      gp[0] = mpn_gcd_11 (a, b);
      return 1;
  else if (an == 2)
      mp_limb_t b_reduced[2];
      mpn_div_r_2 (b_reduced, bp, bn, ap);
      return mpn_gcd_22 (gp, ap, b_reduced);
      ... general case, involving initial large division, allocaton
      of scratch space, and calling Lehmer or Schönhage gcd ...

And it's silly to have each of these call sites do algorithm selection
for that division.

One peculiarity of the new jacobi code is that it would also have use
for a function which returns the complete remainder and the least
significant quotient word (it really needs the least significant two
bits of the quotient). Maybe this is too specialized to be worth
supporting, but we can keep it in mind (without thinking deeply about
it, I'd expect such a function can be implemented by calling div_r on
all but the least significant limbs of the input, followed by a div_qr
to produce the least significant quotient limb and the final remainder.

Which reminds me that the divappr family is missing from your table (and
gcd could use a divappr_qr, if there's any reason to believe that would
be more efficient than div_qr).

>   gcd_2 (currently defined statically in mpn/gcd.c)
> Accepting two 2-limb operands?

Exactly. And for assembler implementations it would make sense to have
gcd_11 and gcd_22 as two entry points in the same file, to avoid the
function call from gcd_22 to gcd_11.

About the qh output argument for div_qr_1. What's the status of
returning struct types? On which platforms can one return a struct such

  struct div_qr_value { mp_limb_t qh; mp_limb_t r };

and get it into two registers? Other alternative behaviours include

  1. Return values stored in memory, using pointer provided behind the
     scenes by the caller.
  2. Return value stored at a static location, pointer returned
     (old-fashioned and broken behaviour).

  3. Broken C compiler that doesn't support struct return at all.

Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.

More information about the gmp-devel mailing list