Efficiently importing/export GMP floats
Torbjorn Granlund
tg at gmplib.org
Mon Jan 17 13:29:03 CET 2011
Paul.Zimmermann at loria.fr writes:
not yet, this is one of the things we discussed during our meeting last week.
We have designed a portable format, see
https://gforge.inria.fr/scm/viewvc.php/trunk/src/out_raw.c?view=markup&root=mpfr
Please send comments to the MPFR list if any.
I understand that this is somewhat inspired by GMP's mpz_out_raw and
mpz_inp_raw, albeit with some differences.
I consider mpz_out_raw/mpz_inp_raw to be somewhat obsolete due to their
32-bit byte count. (Alas, they should probably not be "fixed", since
keeping file compatibility is perhaps even more important that keeping
binary or source compatibility; people might have things stored in files
that might become unreadable if we update the format.)
It might therefore make sense to update to a coherent format for GMP and
MPFR, using function names not to be confused with mpz_out_raw/mpz_inp_raw.
* The format should by byte-oriented, not 32-bit oriented.
* "Network byte order" should be used, i.e., most significant byte first.
* No reasonable size limitations should be be made by the format.
* If possible, some compatibility between mpz, mpf, and mpfr formats should
be created.
* Perhaps we should support printing to memory buffers as well as files.
Floats:
| exponent | size | digits |
Integers:
| size | digits |
The size property needs to allow sizes up to the largest possible size we
can ever store. Making the size being a 64-bit integer always would in my
opinion be a mistake, as it wastes space. We should therefore have a 'size
for the size property', supporting 1 byte, 2, byte, up to perhaps 8 byte
sizes. We also need to be able to represent zero, presumably by having the
size property be zero.
The first byte of the size property should contain:
sign_bit: 1
size_size: 3 /* how many bytes is the size property, excluding the size4 bits? */
size4: 4 /* most significant 4 bits of the size property */
For 0, all 8 bits should be zero (except for -0, where the sign bit will be
set).
If size_size = 0, the size property is encoded just in the 4 bits of
'size4', allowing the 'digits' part to be up to 15 bytes.
If size_size 1...7, the size property will be encoded as 8+4=12 to 7*8+4 =
60 bits. (And remember that these refer to # of bytes in the 'digits'
part.)
This is a compact format for small "bignums", as such numbers are the most
common ones. It should also be easy and fast to read/write.
For float formats, the exponent property should be encoded in the same
spirit as the size property. Its interpretation should be analogous to
IEEE-754, i.e., allow NaNs and Infs. It should be base 2 wrt the mantissa,
of course. (I know people off and on request a bignum exponent, which this
format will not allow. If it is really to be taken as serious propositions
with practical use, then perhaps one should allow for a more variable size
exponent.)
This format could then be supported by common GMP routines, used by both
GMP and MPFR.
--
Torbjörn
More information about the gmp-discuss
mailing list