Question about mpz_clear

Michel Bardiaux
Fri, 20 Jun 2003 13:47:08 +0200

Geoff Thorpe wrote:
> Hi Michel,
> On June 19, 2003 12:19 pm, you wrote:
>>I already do that (the trick, not the punishment...) a lot, in a
>>biggish set of libraries where I store reference counts, vector sizes,
>>and RTTI (of my own) as meta-data; so I know the technique well.
> Ah ok, I'm preaching to the converted ... :-)

The foundations of this dev were written in 1986 (in FORTRAN!), so maybe 
a prophet rather than a convert :-)

>>But what I want to achieve is to reduce unnecessary heap fragmentation
>>by packing limbs inside the mpX structures themselves. But at the
>>application level I dont know beforehand that a particular mpX will
>>have short limbs, all I know is that most of them do. So, when the
>>library sees the number of limbs is small, it should allocate the limb
>>space inside the mpX object rather than call malloc. The user-supplied
>>memory management functions seemed a good place to concentrate that
>>logic, but they lack that extra parameter that would tell them 'do this
>>operation *ON BEHALF OF* that mpX object'.
> I did something along these lines a while back - I wanted a heap that 
> behaved "malloc-like" when the size_t parameters were biggish, but 
> behaved better when getting oodles of mallocs/frees for small allocation 
> units. I'm digging from my memory here, but I ended up with some stategy 
> where I had some global pools for various ranges, something like;
> typedef struct st_my_pool_item {
>     void *ptr;
>     unsigned int ref_count, array_size;
>     size_t max_size;
>     struct st_my_pool_item *first, *prev, *next;
> } my_pool_item;
> typedef struct st_my_malloc_pool {
>     my_pool_item *list;
>     size_t max_size;
> } my_malloc_pool;
> static my_malloc_pool my_pools[...];
> #define MY_POOL_IDX(sz) ... /* converts a size_t to a my_pools index */
> #define MY_ITEM_MAX(idx) ... /* converts my_pools index to its max_size */
> So I'd have an array of pools for various ranges of size_t, and those 
> ranges were divided exponentially - I can't remember what I used, but it 
> may well have been something like 2^n, 3*2^(n-1), 2^(n+1), 3*2^n, 
> 2^(n+2),... it was something that made the macro a fast bitwise 
> operation, whatever it was. These pools themselves would maintain a 
> linked list of "pool_item"s, and each pool item consisted of a single 
> pre-allocated block of memory to represent 'array_size' many memory 
> segments each of size 'max_size' plus a header. That header looked 
> something like;
> typedef struct st_my_mem_header {
>     my_pool_item *pool_item;
>     int used;
> } my_mem_header;
> So inside the free (or realloc) callback, this my_mem_header could get you 
> back to the owning my_pool_item. "ref_count" counts how many of the 
> sections of the memory block are owned by the application. If it's less 
> than the maximum, you know you can iterate across them and find one with 
> the my_mem_header->used set to zero that you can use. There's various 
> optimisations I can recall now I think about it, as you could too no 
> doubt, but I won't further waste your time or mine with them (the 
> simplest is to note that pool_item will be an aligned pointer so you 
> could use the LS-bit as a flag for "used" and make the my_mem_header 
> overhead smaller).
> The point was, this stuff worked quite well for small allocation sizes - 
> because all the iterations and list things worked well with onboard 
> memory caching. Also, the thrashing of malloc/free would always try to 
> "allocate" from the pools closest to the beginning of the list (which 
> were the first to be allocated by malloc(3)). You can either free a pool 
> when its ref_count hits zero, or better (if you want to avoid 
> fragmentation from releasing "early" pools) free it if its ref_count hits 
> zero *and* the 'prev' pool is also unused.
> All this becomes a total waste of time at larger sizes, and for that you 
> simply use a NULL "pool_item" pointer in your memory header and that 
> means "this came directly from malloc(3) and has no pool". But I think 
> this type of approach works well in the type of circumstances you're 
> talking about. You essentially end up with an array of pre-allocated 
> stacks, each to cater for a fixed range of allocation sizes by always 
> providing a block of memory of the maximum size of the range. So it 
> starts to perform more like alloca() and less like malloc(). You can also 
> tweak this approach a lot - if you find that reallocation becomes a major 
> problem, you can sacrifice your memory a bit and have larger lower-bounds 
> on the pool ranges and/or wider ranges for each pool - eg. this might 
> help speed up a majority of the mpX resize operations if the 
> reallocations have no work to do.
> Is this type of thing related to how your heap works? If this is all 
> obvious and/or known stuff, forgive me (again). Just thinking out loud in 
> case it's helpful.
> Cheers,
> Geoff

Yes, meeting of minds! we already have all these optimisations and some 
more. The problem is, gprof shows that even with all these 
optimisations, there is still a lot of overhead for the management of 
short limbs, which I would like to avoid by having each mpX object use 
*itself* as pool for allocating its limbs whenever possible.

Michel Bardiaux
Peaktime Belgium S.A.  Bd. du Souverain, 191  B-1160 Bruxelles
Tel : +32 2 790.29.41