# Division call in mpn_gcd

Torbjorn Granlund tg at gmplib.org
Fri Feb 24 14:51:00 CET 2012

```nisse at lysator.liu.se (Niels Möller) writes:

Torbjorn Granlund <tg at gmplib.org> writes:

> Except that redc does not accept independent sizes.  Therefore we need
> to use bdiv_qr.

I think it should be easy to generalize at least redc_1 and redc_2 to
accept independent sizes. Might require some more work for redc_n, which
would need some kind of block-wise processing.

For some reason I have forgotten, we decided to stick to their current
definitions.  I suppose we'd basically just need to modify the outer
loop count, and perhaps handle carry-out from addmul_N differently.

REDC_1_TO_REDC_2_THRESHOLD
REDC_2_TO_REDC_N_THRESHOLD

I'd say REDC_1_TO_REDC_2_THRESHOLD is more-or-less a plain comparison if
the speed of addmul_1 vs addmul_2, and the inversion costs.  They are
measured for one size operand, using two size operands ought to mean
that sqrt(un*vn) should be compared to REDC_1_TO_REDC_2_THRESHOLD.
(I.e., un*vn should be compared o REDC_1_TO_REDC_2_THRESHOLD^2.)

This reasoning disregards the constant term of the speed of an addmul_1

Using bdiv_qr is surely an improvement, but I think we really ought to
have some division function which doesn't require storage for the
unwanted quotient. Should that function be bdiv_r, or redc, or even
div_r?

I suppose the gcd functions are quite tolerant, at least for an initial
reduction like this one.

IIRC, bdiv and redc have slightly different notions on what the
"remainder" is, but I imagine either variant would fine for the gcd
reduction.

Perhaps this is the reason for keeping redc separate?

When un >> vn, gcd(u, v) shouldn't need O(un) scratch space (btw, this
calls for an initial reduction also in mpz_gcd).

Makes sense.

There are a couple of other divisions in the gcd code where the quotient
is similarly unwanted: The initial division in mpn_gcdext,

Really?  Doesn't that quotient affect the cofactors?

I don't think we need to finalise redc vs (b)div_r before we modify the
gcd code to use hensel division; If tdiv_qr is good enough now, bdiv_qr
ought to be good enough to be worth as a separate improvement.  Sure, it
is cleaner to keep the quotient out-of-the-way (and it will very
slightly simplify the scratch space allocations).

--
Torbjörn
```