help with memory usage
Décio Luiz Gazzoni Filho
decio at decpp.net
Wed Mar 11 03:45:20 CET 2009
On Mar 10, 2009, at 7:16 PM, Christ Schlacta wrote:
> On Tue, Mar 10, 2009 at 1:38 PM, Marc Glisse <marc.glisse at normalesup.org
> > wrote:
>> On Tue, 10 Mar 2009, Christ Schlacta wrote:
>>
>>>> I believe you should store the mpz_class directly, not a pointer
>>>> to it (a
>>>> mpz_class is not much more than a pointer itself). It will save
>>>> some
>>>> amount
>>>> of overhead and simplify the code (no need for new/delete).
>>>
>>> I tried storing an mpz_class inside the prime_t class, and I
>>> experienced string corruption throughout the program. when I
>>> switched
>>> to using a pointer and a new in the constructor, the problem went
>>> away.
>>
>> That only means you didn't do it properly, not that it is a wrong
>> approach.
> for the record, I'm using C++ and libgmpxx, can you paste a snippett
> that can handle mpz_class in a linked list container object without
> pointers correctly?
>>
>>>> You could also stop relying on the default memory allocator and
>>>> either
>>>> tell
>>>> gmp to use some other allocation function or allocate the buffers
>>>> yourself
>>>> (for read-only never-deallocate numbers like your prime numbers
>>>> seem to
>>>> be
>>>> this should not be hard).
>>>>
>>> I have no clue how I would even do that :(
>>
>> If all your numbers fit in say 4 limbs, you could try using
>> mpz_array_init.
>
> my largest number so far is 1339101911. I'm not sure if that would
> fit in 4 limbs, but I'm sure I can read the documentation and find
> out. my major concern is how would this scale as the numbers get
> larger as the program runs, it generates new numbers to add to the end
> of the list.
>
> on an aside, I wrote a new version that doesn't rely on storing the
> whole list in memory, but because it's simpler, it's disk bound :(
> I'll have to write a third version later.
I forgot to mention a second storage scheme for primes that is
asymptotically more efficient than the first scheme I described, and
in fact already more efficient by the time you're dealing with numbers
of the size you mentioned (1339101911). Use an array of chars (8 bits
each) and just store the deltas between each prime, so you'd start at
(say) 2, then store 1 (so 2+1=3), 2 (so 3+2=5), 2 (so 5+2=7), 4 (so
7+4=11) and so on. This is efficient because prime gaps near 2^n grow
as log(2^n) according to the prime number theorem; since you can store
gaps up to 2^n in n bits, your storage requirements grow as
log(log(n)), as opposed to just log(n) for the bitmap solution.
According to http://www.trnicely.net/gaps/gaplist.html, you'll reach
the limit of 8-bit integers at 1872851947; however, given that all
primes (save for 2) are odd, then all gaps (save for the gap between 2
and 3) are even, so you can store delta/2 and now go up to
1999066711391. Now if you don't mind losing the random access property
and being forced to read your list sequentially, you can go much
further by defining say 0xFF as a escape character to mean the next 2
bytes will represent a 16-bit prime gap, which is a limit you won't
reach until you're dealing with primes tens of thousands of digits
long. If you need random access, then a compromise is to work in
blocks of (say) 100 primes, storing the starting prime for the 100-
prime sequence in an mpz_class and then the gaps for the rest of the
primes. For an appropriate choice of block size, the space wasted with
redundant information is negligible, and it won't take long to compute
the value of a prime within that sequence. You can even store the
sequence starts in an array to perform binary search with, if your
application calls for it.
Just to give you an idea of the values we're talking about,
considering your largest number (1999066711391), bitmaps would use
about 160 MB of RAM, whereas bitmaps would use about 60 MB -- for the
latter, this is an estimate from the PNT (n/log(n)), so results may be
off by a few percent. If you want to get real hardcore, you could try
to find some info about the statistical distribution of prime gaps
(e.g. http://wwwmaths.anu.edu.au/~brent/pd/rpb021a.pdf), then consider
prime gaps as a random process and find a compression scheme that is
matched to the statistics of the samples being generated.
Décio
More information about the gmp-discuss
mailing list