Efficiently importing/export GMP floats

Mon Jan 17 15:01:33 CET 2011

I do not see where network byte order is in any way desirable, as it necessarily mandates manipulation on any machine with an alternate native endian format.

Since it's most likely and often to be the case where a single machine is used to write / read its own data for later processing, why not indicate in the size byte you propose a single bit for endianess? And then use the full lower 6 bits to indicate size byte count? You'll keep your one byte zero form, lose your one byte up-to-15 form, yet gain up to 63 bytes for size, AND preserve the native endianness of the machine.

Requiring conversions both in and out on some architectures seems wholly undesirable. It should only be required whenever it happens to be different.

It should also be a forcible setting, meaning if I know I'm generating data for an alternate or known endian architecture, I can force it to write it out that way regardless of what machine I'm on.

- Rick

-----Original message-----
From: Torbjorn Granlund <tg at gmplib.org>
To: gmp-discuss at gmplib.org, mpfr at mpfr.org
Sent: Mon, Jan 17, 2011 12:41:18 GMT+00:00
Subject: Re: Efficiently importing/export GMP floats

Paul.Zimmermann at loria.fr writes:

  not yet, this is one of the things we discussed during our meeting last week.
  We have designed a portable format, see
  https://gforge.inria.fr/scm/viewvc.php/trunk/src/out_raw.c?view=markup&root=mpfr

  Please send comments to the MPFR list if any.

I understand that this is somewhat inspired by GMP's mpz_out_raw and
mpz_inp_raw, albeit with some differences.

I consider mpz_out_raw/mpz_inp_raw to be somewhat obsolete due to their
32-bit byte count.  (Alas, they should probably not be "fixed", since
keeping file compatibility is perhaps even more important that keeping
binary or source compatibility; people might have things stored in files
that might become unreadable if we update the format.)

It might therefore make sense to update to a coherent format for GMP and
MPFR, using function names not to be confused with mpz_out_raw/mpz_inp_raw.

* The format should by byte-oriented, not 32-bit oriented.

* "Network byte order" should be used, i.e., most significant byte first.

* No reasonable size limitations should be be made by the format.

* If possible, some compatibility between mpz, mpf, and mpfr formats should
  be created.

* Perhaps we should support printing to memory buffers as well as files.

Floats:
| exponent | size | digits |

Integers:
| size | digits |

The size property needs to allow sizes up to the largest possible size we
can ever store.  Making the size being a 64-bit integer always would in my
opinion be a mistake, as it wastes space.  We should therefore have a 'size
for the size property', supporting 1 byte, 2, byte, up to perhaps 8 byte
sizes.  We also need to be able to represent zero, presumably by having the
size property be zero.

The first byte of the size property should contain:

sign_bit:  1
size_size: 3    /* how many bytes is the size property, excluding the size4 bits? */
size4:     4    /* most significant 4 bits of the size property */

For 0, all 8 bits should be zero (except for -0, where the sign bit will be
set).

If size_size = 0, the size property is encoded just in the 4 bits of
'size4', allowing the 'digits' part to be up to 15 bytes.

If size_size 1...7, the size property will be encoded as 8+4=12 to 7*8+4 =
60 bits.  (And remember that these refer to # of bytes in the 'digits'
part.)

This is a compact format for small "bignums", as such numbers are the most
common ones.  It should also be easy and fast to read/write.

For float formats, the exponent property should be encoded in the same
spirit as the size property.  Its interpretation should be analogous to
IEEE-754, i.e., allow NaNs and Infs.  It should be base 2 wrt the mantissa,
of course.  (I know people off and on request a bignum exponent, which this
format will not allow.  If it is really to be taken as serious propositions
with practical use, then perhaps one should allow for a more variable size
exponent.)

This format could then be supported by common GMP routines, used by both
GMP and MPFR.

-- 
Torbjörn
_______________________________________________
gmp-discuss mailing list
gmp-discuss at gmplib.org
https://gmplib.org/mailman/listinfo/gmp-discuss