bdiv vs redc
Torbjorn Granlund
tg at gmplib.org
Thu Jun 28 18:38:27 CEST 2012
nisse at lysator.liu.se (Niels Möller) writes:
I seriously doubt that it works for sbpi1_bdiv_qr. I think it does work
for sbpi1_bdiv_q. That function doesn't use the trick of delaying the
addition of the carry limbs, it just applies the carry limbs one at a
time with mpn_add_1, like
cy = mpn_addmul_1 (np, dp, dn, q);
mpn_add_1 (np + dn, np + dn, i, cy);
Which is probably costing many cycles.
I would like a structure like redc's, i.e.
iterate nn-dn times
q = ...
hi = mpn_addmul_1 (...)
blah
I mean, a single outer loop for each quotient limb. This could be made
to work if blah is an mpn_add_1, but that's quite expensive. It can
probably not be made to work without applying mpn_addmul_1's carry-out
to the partial remainder.
I think the right way would be to apply it manually, and keep 2-limb
carry, one from mpn_addmul_1 and one from the addition of the return
value to the partial remainder... Something like this:
iterate nn-dn times
q = np[0] * dinv
hi = mpn_addmul_1 (np, dp, dn, q)
hi += cy
cy = hi < cy // can this be true?
hi += np[dn]
cy2 += hi < np[dn]
np[dn] = hi
np += 1;
--
Torbjörn
More information about the gmp-devel
mailing list