Division call in mpn_gcd
tg at gmplib.org
Fri Feb 24 14:51:00 CET 2012
nisse at lysator.liu.se (Niels Möller) writes:
Torbjorn Granlund <tg at gmplib.org> writes:
> Except that redc does not accept independent sizes. Therefore we need
> to use bdiv_qr.
I think it should be easy to generalize at least redc_1 and redc_2 to
accept independent sizes. Might require some more work for redc_n, which
would need some kind of block-wise processing.
For some reason I have forgotten, we decided to stick to their current
definitions. I suppose we'd basically just need to modify the outer
loop count, and perhaps handle carry-out from addmul_N differently.
> 4. Your point 3. Which thresholds are you talking about?
I'd say REDC_1_TO_REDC_2_THRESHOLD is more-or-less a plain comparison if
the speed of addmul_1 vs addmul_2, and the inversion costs. They are
measured for one size operand, using two size operands ought to mean
that sqrt(un*vn) should be compared to REDC_1_TO_REDC_2_THRESHOLD.
(I.e., un*vn should be compared o REDC_1_TO_REDC_2_THRESHOLD^2.)
This reasoning disregards the constant term of the speed of an addmul_1
or addmul_2 invocation.
Using bdiv_qr is surely an improvement, but I think we really ought to
have some division function which doesn't require storage for the
unwanted quotient. Should that function be bdiv_r, or redc, or even
I suppose the gcd functions are quite tolerant, at least for an initial
reduction like this one.
IIRC, bdiv and redc have slightly different notions on what the
"remainder" is, but I imagine either variant would fine for the gcd
Perhaps this is the reason for keeping redc separate?
When un >> vn, gcd(u, v) shouldn't need O(un) scratch space (btw, this
calls for an initial reduction also in mpz_gcd).
There are a couple of other divisions in the gcd code where the quotient
is similarly unwanted: The initial division in mpn_gcdext,
Really? Doesn't that quotient affect the cofactors?
I don't think we need to finalise redc vs (b)div_r before we modify the
gcd code to use hensel division; If tdiv_qr is good enough now, bdiv_qr
ought to be good enough to be worth as a separate improvement. Sure, it
is cleaner to keep the quotient out-of-the-way (and it will very
slightly simplify the scratch space allocations).
More information about the gmp-devel