Base (2 to 62) limitation for set_str initialization and output

Sam Rawlins sam.rawlins at gmail.com
Wed Dec 12 15:30:37 CET 2012


On Tue, Dec 11, 2012 at 4:27 PM, Sergio Martin <eienburuu at gmail.com> wrote:

> Hi all dear people discussing GMP,
>

Hello Sergio,

>
> I think I get the rationale behind limiting the base by which a number can
> be initialized from, or outputted to a string array (or stdin/stdout, for
> that matter). Correct me if I'm wrong: it is limited by the ammount of
> alphanumeric symbols (including case sensitivity) that can be used to
> represent a number.
>

Correct.

>
> While it makes perfect sense viewed from that angle, I can't help but think
> of the space - memory or secondary storage - efficiency loss for big
> (really big) numbers. Large numbers stored in files, transmitted by
> network, or held in memory will occupy much less space if they are encoded
> in base 256. Using the maximum base 62 representation would occupy ~4-times
> more bytes in memory than the same number represented with a base 256
> would.
>
> Note that I'm just talking about the set_str and output functions, and that
> I know that the internal management of the limbs is perfectly
> memory-efficient (or tends to be, at least). Also, I know that one could
> create it's own conversion program without much problems, but I feel
> convenient that this would be implemented within GMP.
>
> I've seen the mpz_out_raw(..), and mpz_inp_raw(...) functions, and they
> seem to fit for that purpose. However, they both requiere a mpz integer to
> have been already initialized. Had that initialization been done with the
> base 2..62 limitation, the problem would still persist.
>

Initialization is not done with a base (see
http://gmplib.org/manual/Initializing-Integers.html). mpz_inp_raw() and
mpz_out_raw() are exactly what you are looking for:

mpz_t a, b, c;
mpz_init(a); mpz_init(b); mpz_init(c);
FILE *handle;
/* lots of fancy math with a, b, and c */
/* create a file handle */
mpz_out_raw(handle, c);

c is now stored on disk in a compact binary format. Later on, read back
from disk:

mpz_t a, b, c;
mpz_init(a); mpz_init(b); mpz_init(c);
FILE *handle;
/* create a file handle */
mpz_inp_raw(handle, c, handle);
/* continue fancy math with restored c */

See also: http://gmplib.org/manual/I_002fO-of-Integers.html

>
> While I don't have a personal need for it, I'm really interested in
> managing really big numbers, and I couldn't help but think of the space I
> could spare if I could input unsigned integers stored as base 256 files.
>

I'm honestly curious as to what space you are hoping to spare. I did some
math:

One 4096-bit number takes 688 base-62 digits, or 512 base-256 digits, a
savings of 176 bytes. One million of these numbers would save you a grand
total of 176 MB.

One 1048576-bit number takes 176107 base-62 digits, or 131072 base-256
digits, a savings of 45,035 bytes. One million of these numbers would
saving you a grand total of 45 GB when using mpz_out_raw instead of
mpz_out_str.

I did the math with the gmp gem in Ruby:

$ irb -r gmp --simple-prompt
>> GMP::F(2**4096).log / GMP::F(62).log
=> 0.68791819860804378e+3
>> GMP::F(2**4096).log / GMP::F(256).log
=> 0.51200000000000000e+3
>> GMP::F(2**1048576).log / GMP::F(62).log
=> 0.17610705884365921e+6
>> GMP::F(2**1048576).log / GMP::F(256).log
=> 0.13107200000000000e+6
>> 176107 - 131072
=> 45035

>
> Please let me know if this has been thought or discussed before; if it is
> implemented and I missed it from reading the manual; or if I'm wrong in any
> point, please let me know.
>
> Thanks for your time,
>
> Sergio Martin
> _______________________________________________
> gmp-discuss mailing list
> gmp-discuss at gmplib.org
> https://gmplib.org/mailman/listinfo/gmp-discuss
>



-- 
Sam Rawlins


More information about the gmp-discuss mailing list