help with memory usage

Wed Mar 11 05:31:34 CET 2009

I'm just an old man with a bit of enthusiasm for prime numbers, so
I've given myself a few simple goals:
1) use all available CPU power to generate prime numbers
2) find the largest prime number I can and hope it's a new record
3) be able to view the largest prime I've found, the last number I've
tested, and the whole list of primes any time I want.

Given my simple goals, and my current programming level, I think the
bitmap idea is a bit beyond me for now, but I will put it on my list
of skills to learn for prime4 or prime5.  I just built prime2 earlier
today using an std::list, and when I loaded my entire list, I found I
was still using almost the same amount of memory and had the overhead
from the random insert and random read options which I don't need.

prime and prime2 are the names of the tiny prime finding programs I
wrote, and since prime2 is currently disk bound (:() I'm taking a few
days to brush up on new techniques to use in my mini-quest!

That said, the number I pasted earlier was of the "last number I've
tested" variety.  I'm not tracking whether that number is prime or not
yet.

Thank you everyone for the great ideas, and if you have any more, I'd
love to hear them!  I'm going to go now to mull over the bitmap idea
and how to best apply that~

On Tue, Mar 10, 2009 at 7:45 PM, Décio Luiz Gazzoni Filho
<decio at decpp.net> wrote:
>
> On Mar 10, 2009, at 7:16 PM, Christ Schlacta wrote:
>
>> On Tue, Mar 10, 2009 at 1:38 PM, Marc Glisse <marc.glisse at normalesup.org>
>> wrote:
>>>
>>> On Tue, 10 Mar 2009, Christ Schlacta wrote:
>>>
>>>>> I believe you should store the mpz_class directly, not a pointer to it
>>>>> (a
>>>>> mpz_class is not much more than a pointer itself). It will save some
>>>>> amount
>>>>> of overhead and simplify the code (no need for new/delete).
>>>>
>>>> I tried storing an mpz_class inside the prime_t class, and I
>>>> experienced string corruption throughout the program.  when I switched
>>>> to using a pointer and a new in the constructor, the problem went
>>>> away.
>>>
>>> That only means you didn't do it properly, not that it is a wrong
>>> approach.
>>
>> for the record, I'm using C++ and libgmpxx, can you paste a snippett
>> that can handle mpz_class in a linked list container object without
>> pointers correctly?
>>>
>>>>> You could also stop relying on the default memory allocator and either
>>>>> tell
>>>>> gmp to use some other allocation function or allocate the buffers
>>>>> yourself
>>>>> (for read-only never-deallocate numbers like your prime numbers seem to
>>>>> be
>>>>> this should not be hard).
>>>>>
>>>> I have no clue how I would even do that :(
>>>
>>> If all your numbers fit in say 4 limbs, you could try using
>>> mpz_array_init.
>>
>> my largest number so far is 1339101911.  I'm not sure if that would
>> fit in 4 limbs, but I'm sure I can read the documentation and find
>> out.  my major concern is how would this scale as the numbers get
>> larger as the program runs, it generates new numbers to add to the end
>> of the list.
>>
>> on an aside, I wrote a new version that doesn't rely on storing the
>> whole list in memory, but because it's simpler, it's disk bound :(
>> I'll have to write a third version later.
>
> I forgot to mention a second storage scheme for primes that is
> asymptotically more efficient than the first scheme I described, and in fact
> already more efficient by the time you're dealing with numbers of the size
> you mentioned (1339101911). Use an array of chars (8 bits each) and just
> store the deltas between each prime, so you'd start at (say) 2, then store 1
> (so 2+1=3), 2 (so 3+2=5), 2 (so 5+2=7), 4 (so 7+4=11) and so on. This is
> efficient because prime gaps near 2^n grow as log(2^n) according to the
> prime number theorem; since you can store gaps up to 2^n in n bits, your
> storage requirements grow as log(log(n)), as opposed to just log(n) for the
> bitmap solution. According to http://www.trnicely.net/gaps/gaplist.html,
> you'll reach the limit of 8-bit integers at 1872851947; however, given that
> all primes (save for 2) are odd, then all gaps (save for the gap between 2
> and 3) are even, so you can store delta/2 and now go up to 1999066711391.
> Now if you don't mind losing the random access property and being forced to
> read your list sequentially, you can go much further by defining say 0xFF as
> a escape character to mean the next 2 bytes will represent a 16-bit prime
> gap, which is a limit you won't reach until you're dealing with primes tens
> of thousands of digits long. If you need random access, then a compromise is
> to work in blocks of (say) 100 primes, storing the starting prime for the
> 100-prime sequence in an mpz_class and then the gaps for the rest of the
> primes. For an appropriate choice of block size, the space wasted with
> redundant information is negligible, and it won't take long to compute the
> value of a prime within that sequence. You can even store the sequence
> starts in an array to perform binary search with, if your application calls
> for it.
>
> Just to give you an idea of the values we're talking about, considering your
> largest number (1999066711391), bitmaps would use about 160 MB of RAM,
> whereas bitmaps would use about 60 MB -- for the latter, this is an estimate
> from the PNT (n/log(n)), so results may be off by a few percent. If you want
> to get real hardcore, you could try to find some info about the statistical
> distribution of prime gaps (e.g.
> http://wwwmaths.anu.edu.au/~brent/pd/rpb021a.pdf), then consider prime gaps
> as a random process and find a compression scheme that is matched to the
> statistics of the samples being generated.
>
> Décio

-- 
 (\_/)  This is Bunny. Copy and paste Bunny
(='.'=) into your signature to help him gain
(")_(") world domination.