Recent changes to mpn_get_str/mpn_set_str
Torbjörn Granlund
tg at gmplib.org
Fri Jan 27 08:41:40 UTC 2017
nisse at lysator.liu.se (Niels Möller) writes:
tg at gmplib.org (Torbjörn Granlund) writes:
> I'll see if I can brush up the code and make use of it for machines with
> pipelined multiply.
Am I getting this right, that making use of addmul_2 should be
sufficient to get a couple of multiplies running in parallel on those
machines?
I don't think addmul_2 is needed to make a difference.
The real difference presumably will come from the relative speed of
divrem_1 vs addmul_1.
The current code iterated divrem_1 to get a (fractional) limb to be
converted, meanaing we perform O(n^2) divrem_1 steps in total.
If we instead divide the n limbs by a larger power of the base, then we
perform O(n^2) addmul_1 steps as part of the schoolbook division
generating a n-limb fraction. We then pay a cost for O(n^2) mul_1 calls
to gradually "integerify" limbs.
(One can obviously use a decreasing count for mul_1 as fewer limbs
remain, just as we do for divrem_1 today. That means (n^2+n)/2 steps in
each case. I am not sure we can avoid the exactly n^2 addmul_1 steps of
the schoolbook division.)
--
Torbjörn
Please encrypt, key id 0xC8601622
More information about the gmp-devel
mailing list