help with memory usage

Wed Mar 11 03:45:20 CET 2009

On Mar 10, 2009, at 7:16 PM, Christ Schlacta wrote:

> On Tue, Mar 10, 2009 at 1:38 PM, Marc Glisse <marc.glisse at normalesup.org 
> > wrote:
>> On Tue, 10 Mar 2009, Christ Schlacta wrote:
>>
>>>> I believe you should store the mpz_class directly, not a pointer  
>>>> to it (a
>>>> mpz_class is not much more than a pointer itself). It will save  
>>>> some
>>>> amount
>>>> of overhead and simplify the code (no need for new/delete).
>>>
>>> I tried storing an mpz_class inside the prime_t class, and I
>>> experienced string corruption throughout the program.  when I  
>>> switched
>>> to using a pointer and a new in the constructor, the problem went
>>> away.
>>
>> That only means you didn't do it properly, not that it is a wrong  
>> approach.
> for the record, I'm using C++ and libgmpxx, can you paste a snippett
> that can handle mpz_class in a linked list container object without
> pointers correctly?
>>
>>>> You could also stop relying on the default memory allocator and  
>>>> either
>>>> tell
>>>> gmp to use some other allocation function or allocate the buffers
>>>> yourself
>>>> (for read-only never-deallocate numbers like your prime numbers  
>>>> seem to
>>>> be
>>>> this should not be hard).
>>>>
>>> I have no clue how I would even do that :(
>>
>> If all your numbers fit in say 4 limbs, you could try using  
>> mpz_array_init.
>
> my largest number so far is 1339101911.  I'm not sure if that would
> fit in 4 limbs, but I'm sure I can read the documentation and find
> out.  my major concern is how would this scale as the numbers get
> larger as the program runs, it generates new numbers to add to the end
> of the list.
>
> on an aside, I wrote a new version that doesn't rely on storing the
> whole list in memory, but because it's simpler, it's disk bound :(
> I'll have to write a third version later.

I forgot to mention a second storage scheme for primes that is  
asymptotically more efficient than the first scheme I described, and  
in fact already more efficient by the time you're dealing with numbers  
of the size you mentioned (1339101911). Use an array of chars (8 bits  
each) and just store the deltas between each prime, so you'd start at  
(say) 2, then store 1 (so 2+1=3), 2 (so 3+2=5), 2 (so 5+2=7), 4 (so  
7+4=11) and so on. This is efficient because prime gaps near 2^n grow  
as log(2^n) according to the prime number theorem; since you can store  
gaps up to 2^n in n bits, your storage requirements grow as  
log(log(n)), as opposed to just log(n) for the bitmap solution.  
According to http://www.trnicely.net/gaps/gaplist.html, you'll reach  
the limit of 8-bit integers at 1872851947; however, given that all  
primes (save for 2) are odd, then all gaps (save for the gap between 2  
and 3) are even, so you can store delta/2 and now go up to  
1999066711391. Now if you don't mind losing the random access property  
and being forced to read your list sequentially, you can go much  
further by defining say 0xFF as a escape character to mean the next 2  
bytes will represent a 16-bit prime gap, which is a limit you won't  
reach until you're dealing with primes tens of thousands of digits  
long. If you need random access, then a compromise is to work in  
blocks of (say) 100 primes, storing the starting prime for the 100- 
prime sequence in an mpz_class and then the gaps for the rest of the  
primes. For an appropriate choice of block size, the space wasted with  
redundant information is negligible, and it won't take long to compute  
the value of a prime within that sequence. You can even store the  
sequence starts in an array to perform binary search with, if your  
application calls for it.

Just to give you an idea of the values we're talking about,  
considering your largest number (1999066711391), bitmaps would use  
about 160 MB of RAM, whereas bitmaps would use about 60 MB -- for the  
latter, this is an estimate from the PNT (n/log(n)), so results may be  
off by a few percent. If you want to get real hardcore, you could try  
to find some info about the statistical distribution of prime gaps  
(e.g. http://wwwmaths.anu.edu.au/~brent/pd/rpb021a.pdf), then consider  
prime gaps as a random process and find a compression scheme that is  
matched to the statistics of the samples being generated.

Décio