Efficiently importing/export GMP floats

Tue Jan 18 17:33:39 CET 2011

       Torbjörn,

thank you for your feedback. I keep gmp-discuss in cc since this might also be
interesting for GMP users.

> From: Torbjorn Granlund <tg at gmplib.org>
> Date: Mon, 17 Jan 2011 13:29:03 +0100
> 
> Paul.Zimmermann at loria.fr writes:
>   
>   not yet, this is one of the things we discussed during our meeting last week.
>   We have designed a portable format, see
>   https://gforge.inria.fr/scm/viewvc.php/trunk/src/out_raw.c?view=markup&root=mpfr
>   
>   Please send comments to the MPFR list if any.
>   
> I understand that this is somewhat inspired by GMP's mpz_out_raw and
> mpz_inp_raw, albeit with some differences.

indeed.

> I consider mpz_out_raw/mpz_inp_raw to be somewhat obsolete due to their
> 32-bit byte count.  (Alas, they should probably not be "fixed", since
> keeping file compatibility is perhaps even more important that keeping
> binary or source compatibility; people might have things stored in files
> that might become unreadable if we update the format.)
> 
> It might therefore make sense to update to a coherent format for GMP and
> MPFR, using function names not to be confused with mpz_out_raw/mpz_inp_raw.

ok. Would mp[z,fr]_out_binary and mp[z,fr]_inp_binary be ok?

> * The format should by byte-oriented, not 32-bit oriented.

agreed.

> * "Network byte order" should be used, i.e., most significant byte first.

here I'm not sure. This is contrary to the GMP representation for arrays of
limbs (least significant limb first). On a little endian machine, using least
significant bytes first would allow to directly use memcpy() to store or read
the significand. What would be the advantages of most significant byte first,
or the drawbacks of least significant byte first?

> * No reasonable size limitations should be be made by the format.

I don't think our proposal has any such limitation.

> * If possible, some compatibility between mpz, mpf, and mpfr formats should
>   be created.

if we design the format together, this should be easy.

> * Perhaps we should support printing to memory buffers as well as files.

good idea.

> Floats:
> | exponent | size | digits |
> 
> Integers:
> | size | digits |
> 
> 
> The size property needs to allow sizes up to the largest possible size we
> can ever store.  Making the size being a 64-bit integer always would in my
> opinion be a mistake, as it wastes space.  We should therefore have a 'size
> for the size property', supporting 1 byte, 2, byte, up to perhaps 8 byte
> sizes.  We also need to be able to represent zero, presumably by having the
> size property be zero.
> 
> The first byte of the size property should contain:
> 
> sign_bit:  1
> size_size: 3    /* how many bytes is the size property, excluding the size4 bits? */
> size4:     4    /* most significant 4 bits of the size property */
> 
> For 0, all 8 bits should be zero (except for -0, where the sign bit will be
> set).
> 
> If size_size = 0, the size property is encoded just in the 4 bits of
> 'size4', allowing the 'digits' part to be up to 15 bytes.

ok.

> If size_size 1...7, the size property will be encoded as 8+4=12 to 7*8+4 =
> 60 bits.  (And remember that these refer to # of bytes in the 'digits'
> part.)
> 
> This is a compact format for small "bignums", as such numbers are the most
> common ones.  It should also be easy and fast to read/write.
> 
> For float formats, the exponent property should be encoded in the same
> spirit as the size property.  Its interpretation should be analogous to
> IEEE-754, i.e., allow NaNs and Infs.  It should be base 2 wrt the mantissa,
> of course.  (I know people off and on request a bignum exponent, which this
> format will not allow.  If it is really to be taken as serious propositions
> with practical use, then perhaps one should allow for a more variable size
> exponent.)

it would be convenient to share routines to read/write the size and exponent
fields (if possible). The main differences are (1) that the exponent is signed
(maybe we can reuse sign_bit and store the absolute value of the exponent,
instead of storing it in 2-complement representation?) and (2) that we need
to represent NaN and Inf (most likely in the size field).

> This format could then be supported by common GMP routines, used by both
> GMP and MPFR.

do you plan to have the same format for both mpz and mpf (with an extra bit
telling if the number is mpz or mpf)? Also the semantics of mpf and mpfr are
not the same (limb-based exponent vs bit-based exponent), thus even if we use
the same format, I'm not sure reading in MPFR a number stored by GMP/mpf could
give the same numerical value.

Paul