Problem with the mp_set_memory_functions interface

Niels Möller nisse at lysator.liu.se
Fri Jan 27 23:07:16 CET 2012


I'm about to implement the functions mp_set_memory_functions and
mp_get_memory_functions in mini-gmp. These functions are needed by
guile, since the gc wants to know the amount of allocated memory.

I find this part of the interface,

:     The REALLOCATE_FUNCTION parameter OLD_SIZE and the FREE_FUNCTION
:  parameter SIZE are passed for convenience, but of course they can be
:  ignored if not needed by an implementation.  The default functions
:  using `malloc' and friends for instance don't use them.
   
extremely awkward, bordering to totally broken.

Considering this implementation

  size_t
  mpz_out_str (FILE *stream, int base, const mpz_t x)
  {
    char *str;
    size_t len;
  
    str = mpz_get_str (NULL, base, x);
    len = strlen (str);
    len = fwrite (str, 1, len, stream);
    gmp_free_func (str, len + 1);
    return len;
  }

(gmp_free_func is mini-gmp's name for gmp's __gmp_allocate_func). The
only reason this can work correctly is that mpz_get_str, with a NULL
buffer argument, is specified as allocating space *exactly* enough for
the digits and the NUL terminator.

:  If STR is `NULL', the result string is allocated using the current
:  allocation function (*note Custom Allocation::).  The block will be
:  `strlen(str)+1' bytes, that being exactly enough for the string and
:  null-terminator.
               
So if mpz_get_str does the natural thing of allocating

  mpz_sizeinbase (x, base) + is_negative + 1

bytes, it *must* use realloc to shrink that allocation in case it turns
out that mpz_sizeinbase returned a value which was one off (as it is
allowed to do). Thats really really ugly, it's an overhead that is
caused by the memory allocation interface, and which is totally
unnecessary for almost all users.

And then, looking at GMP's mpz/get_str.c, it apparently fails to handle this case
correctly! It checks for a very related case,

  /* might have a leading zero, skip it */
  str = res_str;
  if (*res_str == 0 && str_size != 1)
    {
      str_size--;
      str++;
      ASSERT (*str != 0);  /* at most one leading zero */
    }

But it does *not* call __gmp_reallocate_func to ensure that the
allocation size equals the length of the string + 1. So in this case, if
the buffer was allocated by mpz_get_str, the allocation size can be one
byte larger than promised in the specification,

I don't have any really good suggestion on what to do about this. I
imagine *most* custom allocators will do just like the default
allocators and ignore those arguments, but it's still a severe interface
change to remove or obsolete them. And we have a bug which will manifest
itself if

1. An application registers its own allocation function, and the custom
   free function really depends on the size argument being correct.

2. The application uses mpz_get_str with NULL buffer, and deallocates it
   according to the procedure suggested by the mpz_get_str
   documentation.

3. The application then calls mpz_get_str with a value and a base, for
   which mpz_sizeinbase returns a value larger than the actual number of
   digits.

One could add a comparison and a realloc call to mpz_get_str to fix this
obscure and hard-to-trig bug. But I think the right way is to allow
mpz_get_str to allocate one or two bytes extra, and fix the memory
allocation interface so that the area still can be reliably deallocated.

If we do something about that, I think we should also consider to
specify whether or not the gmp_allocate functions can be used to
allocate storage for pointers (I think this question was raised some
years ago with regards to GNU CLISP or GNU Common Lisp). Currently, the
rule is that storage is only allocated for storing limbs. But then
there's at least one exception: __gmp_tmp_reentrant_alloc stores the
tmp_reentrant_t next pointer in memory from __gmp_allocate_func. Which
might be perfectly ok from a gc point of view, since this pointer just
chains together objects which are *all* allocated on the stack. So
saying that "there are no pointers which directly or indirectly points at
objects dynamiallly allocated on the heap" may be true.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.



More information about the gmp-devel mailing list