Factorize denominator of rationale
Paul Leyland
paul.leyland at gmail.com
Fri Dec 2 19:00:10 CET 2022
A technique I have used many times in the past is to split the data into
many files.
A search on the term "rainbow tables" might also be informative,
depending on how you generate your output.
Paul
On 30/11/2022 11:05, Øystein Schønning-Johansen wrote:
> Hi! That was actually the initial idea and why I printed out the
> denominator (and numerators) in base 36, such that I get lots of
> consecutive 0s which will probably compress very well with gz or lz.
>
> The question comes when reading the data. It is huge and when reading
> I need random access to some few values. It is actually discrete
> probability distributions, and I need to pick up two and compare them,
> The file consists of thousands of such probability distributions, and
> if I compress, I have to do: read, decompress all, read the two
> distributions, compare and then ditch the compressed data. I think it
> might be an idea to use compression but I then think I have to
> decompress each probability distribution individually such that I can
> do a lookup first and then only decompress the desired distributions.
>
> This has low priority, and I am very thankful for all the input I have
> received. I'm still in the thinking/designing phase. :-)
>
> -Øystein
>
> ons. 30. nov. 2022 kl. 11:46 skrev Paul Leyland <paul.leyland at gmail.com>:
>
> By far the cheapest in terms of human effort would be to leave
> your code
> alone and pipe its output through gz(1) or the like. It could well be
> computationally less expensive too if you have to implement a fancy
> condensed representation.
>
> Is space really that expensive for you?
>
>
> Paul
>
> On 22/11/2022 10:00, Øystein Schønning-Johansen wrote:
> > Hi!
> >
> > I'm working with probabilities of a sequence of dice rolls, and the
> > different probabilities to be exact with rationales. I've got it all
> > working fine.
> >
> > However, I want to write out all probabilities to a file and
> since there's
> > a lot of them, I want to reduce the file size and need the
> output string to
> > be as short as possible. I hence use mpq_canonicalize() and
> write out with
> > base 36. Works, but can I even be better?
> >
> > Since these are consecutive dice rolls, I can always represent the
> > denominator (after canonicalize) as (2^a)*(3^b) and I then only
> have to
> > store the a and the b. So, the question is: How can I find the a
> and the b
> > of such denominators?
> >
> > Best regards,
> > -Øystein
> > _______________________________________________
> > gmp-discuss mailing list
> > gmp-discuss at gmplib.org
> > https://gmplib.org/mailman/listinfo/gmp-discuss
> _______________________________________________
> gmp-discuss mailing list
> gmp-discuss at gmplib.org
> https://gmplib.org/mailman/listinfo/gmp-discuss
>
More information about the gmp-discuss
mailing list