Recent changes to mpn_get_str/mpn_set_str

Fri Jan 27 08:41:40 UTC 2017

nisse at lysator.liu.se (Niels Möller) writes:

  tg at gmplib.org (Torbjörn Granlund) writes:

  > I'll see if I can brush up the code and make use of it for machines with
  > pipelined multiply.

  Am I getting this right, that making use of addmul_2 should be
  sufficient to get a couple of multiplies running in parallel on those
  machines?

I don't think addmul_2 is needed to make a difference.

The real difference presumably will come from the relative speed of
divrem_1 vs addmul_1.

The current code iterated divrem_1 to get a (fractional) limb to be
converted, meanaing we perform O(n^2) divrem_1 steps in total.

If we instead divide the n limbs by a larger power of the base, then we
perform O(n^2) addmul_1 steps as part of the schoolbook division
generating a n-limb fraction.  We then pay a cost for O(n^2) mul_1 calls
to gradually "integerify" limbs.

(One can obviously use a decreasing count for mul_1 as fewer limbs
remain, just as we do for divrem_1 today.  That means (n^2+n)/2 steps in
each case.  I am not sure we can avoid the exactly n^2 addmul_1 steps of
the schoolbook division.)

-- 
Torbjörn
Please encrypt, key id 0xC8601622