Stack allocation

Mon Jun 9 09:50:29 UTC 2014

nisse at lysator.liu.se (Niels Möller) writes:

  tg at gmplib.org (Torbjörn Granlund) writes:

  > I decided to lower the TMP_SALLOC limit to a bit under 2^15 from the
  > previous 2^16.

  What's the a relative cost of allocation vs simple operations like
  mpn_add_n? For 2^15 limit, that's 512 limbs (on 64-bit). I guess
  overhead of a malloc call might be comparable to an mpn_add_n with n =
  512, but it ought to be a lot faster than, e.g., an n = 256 mpn_mul_n.

It might be the case that malloc's performance vary a lot between
implementations.  I wouldn't be surprised if BSD and GNU malloc are
several times faster than malloc from the various non-free Unices.

I don't expect the free mallocs to need even near time(mpn_add_n(512)).
But perhaps they need 10% of that, which is still too much for GMP.

Fortunately, I don't think we make dynamic allocations for O(n)
operations.

  Would it make sense to lower the limit further to, say, 128 limbs?

Who knows.  I played with that, but it does not decrease stack usage as
much as one might expect (only 20% as measured by the test suite).
Lowering the limit adds gradually more overhead but gives rapidly
diminishing returns in stack use.

  Nice! That seems very reasonable on current desktop and server machines,
  but it might still be a bit large if people use gmp on embedded systems.

Perhaps alloca is not useful there?

There is currently one oddity in that the limit is more limbs on 32-bit
machines than on 64-bit machines.

Torbjörn
Please encrypt, key id 0xC8601622