nisse at lysator.liu.se
Thu May 6 17:27:58 CEST 2010
Torbjorn Granlund <tg at gmplib.org> writes:
> Perhaps, but I think the saving is mainly that you run my code +
> MPN_COPY and your code directly.
Sure, that adds an O(n) cost. I don't understand the loops fully, but I
think the one gcc generates for my code is also a few instructions
shorter. Possible reasons for savings: No fraction logic, and no need to
maintain, adjust and store q after the iteration's final umul.
> Perhaps it is "fair" by some definition, but if we are looking for a
> fast division-by-2limb loop, then it is more useful to compare the loops
I'm also trying to argue that div_r_2 can be faster than div_qr_2 (and
we haven't started looking at Montgomery variants).
> That's interesting. How many cycles per limb does you code take?
I measure it to 32 cycles per limb (at 1000 limbs) vs 34.5 for divrem_2.
> What about first listing the loops, each corresponding to a low-lever
> function, then when we're pretty sure we have the full set, we can start
> worrying what food the functions will need?
div_r (motivated mainly for saving storage, in particular when the
dividend is much larger than the divisor)
frac (do we need separate frac_qr, frac_q, frac_r?)
For bdiv it's messier, with several possible variants for bdiv_r,
bdiv_q, bdiv_qr and redc, so I'll not bother about that at the moment.
gcd_1 (two single limb operands, unlike current mpn_gcd_1. For
gcd(bignum, limb), I'd prefer to have the logic to call mod_1
followed by gcd_1 in C, to simplify assembler implementations.
gcd_2 (currently defined statically in mpn/gcd.c)
gcdext_1 (exists, but I'm not sure what the interface should be)
gcdext_2 (not written)
jacobi_1 (current mpn_jacobi_base, but perhaps allowing b == 1)
jacobi_2 (currently defined statically in jacobi_lehmer.c)
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel