Factorize denominator of rationale

Fri Dec 2 19:00:10 CET 2022

A technique I have used many times in the past is to split the data into 
many files.

A search on the term "rainbow tables" might also be informative, 
depending on how you generate your output.

Paul

On 30/11/2022 11:05, Øystein Schønning-Johansen wrote:
> Hi! That was actually the initial idea and why I printed out the 
> denominator (and numerators) in base 36, such that I get lots of 
> consecutive 0s which will probably compress very well with gz or lz.
>
> The question comes when reading the data. It is huge and when reading 
> I need random access to some few values. It is actually discrete 
> probability distributions, and I need to pick up two and compare them, 
> The file consists of thousands of such probability distributions, and 
> if I compress, I have to do: read, decompress all, read the two 
> distributions, compare and then ditch the compressed data. I think it 
> might be an idea to use compression but I then think I have to 
> decompress each probability distribution individually such that I can 
> do a lookup first and then only decompress the desired distributions.
>
> This has low priority, and I am very thankful for all the input I have 
> received. I'm still in the thinking/designing phase. :-)
>
> -Øystein
>
> ons. 30. nov. 2022 kl. 11:46 skrev Paul Leyland <paul.leyland at gmail.com>:
>
>     By far the cheapest in terms of human effort would be to leave
>     your code
>     alone and pipe its output through gz(1) or the like. It could well be
>     computationally less expensive too if you have to implement a fancy
>     condensed representation.
>
>     Is space really that expensive for you?
>
>
>     Paul
>
>     On 22/11/2022 10:00, Øystein Schønning-Johansen wrote:
>     > Hi!
>     >
>     > I'm working with probabilities of a sequence of dice rolls, and the
>     > different probabilities to be exact with rationales. I've got it all
>     > working fine.
>     >
>     > However, I want to write out all probabilities to a file and
>     since there's
>     > a lot of them, I want to reduce the file size and need the
>     output string to
>     > be as short as possible. I hence use mpq_canonicalize() and
>     write out with
>     > base 36. Works, but can I even be better?
>     >
>     > Since these are consecutive dice rolls, I can always represent the
>     > denominator (after canonicalize) as (2^a)*(3^b) and I then only
>     have to
>     > store the a and the b. So, the question is: How can I find the a
>     and the b
>     > of such denominators?
>     >
>     > Best regards,
>     > -Øystein
>     > _______________________________________________
>     > gmp-discuss mailing list
>     > gmp-discuss at gmplib.org
>     > https://gmplib.org/mailman/listinfo/gmp-discuss
>     _______________________________________________
>     gmp-discuss mailing list
>     gmp-discuss at gmplib.org
>     https://gmplib.org/mailman/listinfo/gmp-discuss
>