bdiv vs redc

Torbjorn Granlund tg at gmplib.org
Thu Jun 28 18:38:27 CEST 2012

nisse at lysator.liu.se (Niels Möller) writes:

I seriously doubt that it works for sbpi1_bdiv_qr. I think it does work
for sbpi1_bdiv_q. That function doesn't use the trick of delaying the
addition of the carry limbs, it just applies the carry limbs one at a

cy = mpn_addmul_1 (np, dp, dn, q);
mpn_add_1 (np + dn, np + dn, i, cy);

Which is probably costing many cycles.

I would like a structure like redc's, i.e.

iterate nn-dn times
q = ...
blah

I mean, a single outer loop for each quotient limb.  This could be made
to work if blah is an mpn_add_1, but that's quite expensive.  It can
to the partial remainder.

I think the right way would be to apply it manually, and keep 2-limb
carry, one from mpn_addmul_1 and one from the addition of the return
value to the partial remainder...  Something like this:

iterate nn-dn times
q = np * dinv
hi = mpn_addmul_1 (np, dp, dn, q)
hi += cy
cy = hi < cy		// can this be true?
hi += np[dn]
cy2 += hi < np[dn]
np[dn] = hi
np += 1;

--
Torbjörn