Recent changes to mpn_get_str/mpn_set_str
Torbjörn Granlund
tg at gmplib.org
Tue Feb 28 22:04:47 UTC 2017
"Marco Bodrato" <bodrato at mail.dm.unipi.it> writes:
> It is not specialised for base 10, and it is not faster than
> current code if only a few limbs must be converted, but around
> 10 limbs it should be a gain.
>
> That's expected, I think.
Well, the graphs on the paper show an earlier gain... but much depend on
the base, on architecture...
Anyway, I attach a second attempt (the full substitute for file
mpn/generic/get_str.c). I specialised code for base==10, refined some
details, and I moved some of the internal logic of mpn_div_q directly into
this code.
Now the crossover is around 5 limbs.
The cross-over between old base-case code for base 10 and the new
"base-case" code?
The code I extracted from mpn_div_q gave me the following:
if (bkn == 1)
mpn_divrem_1 (y0, 0L, tmp, un + 2, bkh);
else if (bkn == 2)
y0 [un + 1] = mpn_divrem_2 (y0, 0L, tmp, un + 3, bk);
else
{
gmp_pi1_t dinv;
invert_pi1 (dinv, bk[bkn - 1], bk[bkn - 2]);
y0 [un + 1] = mpn_sbpi1_div_q (y0, tmp, un + bkn + 1, bk, bkn,
dinv.inv32);
}
Are divrem_[12] the best function we currently have to obtain the quotient
only when we divide by 1 or 2 limbs?
I think so, as mpn_div_qr_1n_pi2/mpn_div_qr_1u_pi2 in assembly are not
yet checked for any machine.
For cutoff point of 5, will divrem_1/divrem_2 ever get used? I'd expect
the choice to be between mpn_sbpi1_div_q and mpn_dcpi1_div_q.
--
Torbjörn
Please encrypt, key id 0xC8601622
More information about the gmp-devel
mailing list