Factorize denominator of rationale

Øystein Schønning-Johansen oysteijo at gmail.com
Wed Nov 30 12:05:38 CET 2022


Hi! That was actually the initial idea and why I printed out the
denominator (and numerators) in base 36, such that I get lots of
consecutive 0s which will probably compress very well with gz or lz.

The question comes when reading the data. It is huge and when reading I
need random access to some few values. It is actually discrete probability
distributions, and I need to pick up two and compare them, The file
consists of thousands of such probability distributions, and if I compress,
I have to do: read, decompress all, read the two distributions, compare and
then ditch the compressed data. I think it might be an idea to use
compression but I then think I have to decompress each probability
distribution individually such that I can do a lookup first and then only
decompress the desired distributions.

This has low priority, and I am very thankful for all the input I have
received. I'm still in the thinking/designing phase. :-)

-Øystein

ons. 30. nov. 2022 kl. 11:46 skrev Paul Leyland <paul.leyland at gmail.com>:

> By far the cheapest in terms of human effort would be to leave your code
> alone and pipe its output through gz(1) or the like. It could well be
> computationally less expensive too if you have to implement a fancy
> condensed representation.
>
> Is space really that expensive for you?
>
>
> Paul
>
> On 22/11/2022 10:00, Øystein Schønning-Johansen wrote:
> > Hi!
> >
> > I'm working with probabilities of a sequence of dice rolls, and the
> > different probabilities to be exact with rationales. I've got it all
> > working fine.
> >
> > However, I want to write out all probabilities to a file and since
> there's
> > a lot of them, I want to reduce the file size and need the output string
> to
> > be as short as possible. I hence use mpq_canonicalize() and write out
> with
> > base 36. Works, but can I even be better?
> >
> > Since these are consecutive dice rolls, I can always represent the
> > denominator (after canonicalize) as (2^a)*(3^b) and I then only have to
> > store the a and the b. So, the question is: How can I find the a and the
> b
> > of such denominators?
> >
> > Best regards,
> > -Øystein
> > _______________________________________________
> > gmp-discuss mailing list
> > gmp-discuss at gmplib.org
> > https://gmplib.org/mailman/listinfo/gmp-discuss
> _______________________________________________
> gmp-discuss mailing list
> gmp-discuss at gmplib.org
> https://gmplib.org/mailman/listinfo/gmp-discuss
>


More information about the gmp-discuss mailing list