Efficiently importing/export GMP floats

Mon Jan 17 13:29:03 CET 2011

Paul.Zimmermann at loria.fr writes:

  not yet, this is one of the things we discussed during our meeting last week.
  We have designed a portable format, see
  https://gforge.inria.fr/scm/viewvc.php/trunk/src/out_raw.c?view=markup&root=mpfr

  Please send comments to the MPFR list if any.

I understand that this is somewhat inspired by GMP's mpz_out_raw and
mpz_inp_raw, albeit with some differences.

I consider mpz_out_raw/mpz_inp_raw to be somewhat obsolete due to their
32-bit byte count.  (Alas, they should probably not be "fixed", since
keeping file compatibility is perhaps even more important that keeping
binary or source compatibility; people might have things stored in files
that might become unreadable if we update the format.)

It might therefore make sense to update to a coherent format for GMP and
MPFR, using function names not to be confused with mpz_out_raw/mpz_inp_raw.

* The format should by byte-oriented, not 32-bit oriented.

* "Network byte order" should be used, i.e., most significant byte first.

* No reasonable size limitations should be be made by the format.

* If possible, some compatibility between mpz, mpf, and mpfr formats should
  be created.

* Perhaps we should support printing to memory buffers as well as files.

Floats:
| exponent | size | digits |

Integers:
| size | digits |

The size property needs to allow sizes up to the largest possible size we
can ever store.  Making the size being a 64-bit integer always would in my
opinion be a mistake, as it wastes space.  We should therefore have a 'size
for the size property', supporting 1 byte, 2, byte, up to perhaps 8 byte
sizes.  We also need to be able to represent zero, presumably by having the
size property be zero.

The first byte of the size property should contain:

sign_bit:  1
size_size: 3    /* how many bytes is the size property, excluding the size4 bits? */
size4:     4    /* most significant 4 bits of the size property */

For 0, all 8 bits should be zero (except for -0, where the sign bit will be
set).

If size_size = 0, the size property is encoded just in the 4 bits of
'size4', allowing the 'digits' part to be up to 15 bytes.

If size_size 1...7, the size property will be encoded as 8+4=12 to 7*8+4 =
60 bits.  (And remember that these refer to # of bytes in the 'digits'
part.)

This is a compact format for small "bignums", as such numbers are the most
common ones.  It should also be easy and fast to read/write.

For float formats, the exponent property should be encoded in the same
spirit as the size property.  Its interpretation should be analogous to
IEEE-754, i.e., allow NaNs and Infs.  It should be base 2 wrt the mantissa,
of course.  (I know people off and on request a bignum exponent, which this
format will not allow.  If it is really to be taken as serious propositions
with practical use, then perhaps one should allow for a more variable size
exponent.)

This format could then be supported by common GMP routines, used by both
GMP and MPFR.

-- 
Torbjörn