nisse at lysator.liu.se
Tue Mar 15 19:15:44 CET 2011
Torbjorn Granlund <tg at gmplib.org> writes:
> One important aspect of SB division is keeping the number of live
> variables (limbs, pointers-to-limbs, sizes) as low as possible. Its
> performance will depend quite much on that.
> I fear udiv_qr_4by3 will need one or two more limb variables live.
I think so too. Are you comparing to the normalized case, or to some
udiv_qr_3by2 variant with normalization on the fly?
Clearing the problematic bits in memory might reduce both register
needs, and use more efficient code (one additional iteration of assembly
submul_1 vs operations coded in C with awkward carry propagation.
It might be difficult to completely avoid regressions unless we go to
the trouble to use different strategies for different sizes. Normalizing
on the fly gives a cost of c_1 qn, while normalizing up front gives a
cost of c_2 (nn + dn) = c_2 qn + 2 c_2 dn, where I imagine that c_2 (per
limb cost of assembly shift loop) is going to be smaller than c_1.
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel