nisse at lysator.liu.se
Tue Mar 15 14:35:15 CET 2011
Torbjorn Granlund <tg at gmplib.org> writes:
> I looked briefly at your code. Wouldn't it make sense to stay with 3/2
> division, using the 2 x GMP_NUMB_BITS most significant bits from the
> divisor, and the 3 x GMP_NUMB_BITS bits from the partial remainder?
> Thus, feed udiv_qr_3by2 with *exactly* the sequence of parameters it
> would get if we instead had pre-shifted the divisor and dividend.
It makes some sense. I tried that first, and I won't say that it's a bad
way to do it, but it seemed so complicated that I decided to try
something different to get it to work.
The problem is what to do when we have computed (using udiv_qr_3by2)
<n2,n1,n0> = q <d1, d0> + <r1, r0>, and next need to do
cy = mpn_submul_1(np + ..., dp, dn-2, q);
<r1, r0> -= some_shift_or_other(cy);
One of the limbs of the partial remainder was already included in part
in the subtraction leading to r0. Should that limb be included in the
submul_1? Then it gets messy since we end up doing some bits of the
partial remainder twice, and need to make sense of the carry limb from
This is the difficulty I tried to explain earlier, and maybe I'm still
not explaining it clearly. One solution might be to zero the partial
limb of n and d which are extracted into n2, n1, n0, d1, d0, so that
submul_1 doesn't see them. (But to modify d this way would be a gross
Or one could exclude the straddling limb from the submul_1 call and
handle it manually. And I think that is more or less equivalent to the
current approach with udiv_qr_4by3.
BTW, if we stick to udiv_qr_4by3, it can most likely be optimized a bit
by using a very sligtly different inverse, taking the additional
(GMP_LIMB_BITS - shift count) bits of d into account.
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel