mod_2

Thu May 6 17:27:58 CEST 2010

Torbjorn Granlund <tg at gmplib.org> writes:

> Perhaps, but I think the saving is mainly that you run my code +
> MPN_COPY and your code directly.

Sure, that adds an O(n) cost. I don't understand the loops fully, but I
think the one gcc generates for my code is also a few instructions
shorter. Possible reasons for savings: No fraction logic, and no need to
maintain, adjust and store q after the iteration's final umul.

> Perhaps it is "fair" by some definition, but if we are looking for a
> fast division-by-2limb loop, then it is more useful to compare the loops
> shoulder-to-shoulder.

I'm also trying to argue that div_r_2 can be faster than div_qr_2 (and
we haven't started looking at Montgomery variants).

> That's interesting.  How many cycles per limb does you code take?

I measure it to 32 cycles per limb (at 1000 limbs) vs 34.5 for divrem_2.

> What about first listing the loops, each corresponding to a low-lever
> function, then when we're pretty sure we have the full set, we can start
> worrying what food the functions will need?

div_qr
div_q
div_r  (motivated mainly for saving storage, in particular when the
        dividend is much larger than the divisor)

frac   (do we need separate frac_qr, frac_q, frac_r?)

div_qr_1
div_r_1
frac_1

div_qr_2
div_r_2
frac_2

For bdiv it's messier, with several possible variants for bdiv_r,
bdiv_q, bdiv_qr and redc, so I'll not bother about that at the moment.

Related:

gcd_1 (two single limb operands, unlike current mpn_gcd_1. For
       gcd(bignum, limb), I'd prefer to have the logic to call mod_1
       followed by gcd_1 in C, to simplify assembler implementations.

gcd_2 (currently defined statically in mpn/gcd.c)

gcdext_1 (exists, but I'm not sure what the interface should be)
gcdext_2 (not written)

jacobi_1 (current mpn_jacobi_base, but perhaps allowing b == 1)
jacobi_2 (currently defined statically in jacobi_lehmer.c)

/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.