Efficiently importing/export GMP floats

Wed Jan 19 17:59:04 CET 2011

[Resend, since wrong mpfr address again.  Sorry.]

  > It might therefore make sense to update to a coherent format for GMP and
  > MPFR, using function names not to be confused with mpz_out_raw/mpz_inp_raw.

  ok. Would mp[z,fr]_out_binary and mp[z,fr]_inp_binary be ok?

Yes, I think those names makes sense.

  > * "Network byte order" should be used, i.e., most significant byte first.

  here I'm not sure. This is contrary to the GMP representation for arrays of
  limbs (least significant limb first). On a little endian machine, using least
  significant bytes first would allow to directly use memcpy() to store or read
  the significand. What would be the advantages of most significant byte first,
  or the drawbacks of least significant byte first?

Network byte order, aka big-endian, is the standard portable byte order.

While I am very keen to optimise GMP, I would probably never consider
looping over memcpy invocations for copying raw limb data to a stdio
buffer.  If somebody convinces me that there are a lot of I/O bound GMP
applications out there, I will not start with using memcpy, but
carefully write a tight, assembly-optimisable output loop.  

  > * No reasonable size limitations should be be made by the format.

  I don't think our proposal has any such limitation.

Indeed.  I was thinking of GMP's mpz_*_raw functions.

  it would be convenient to share routines to read/write the size and exponent
  fields (if possible). The main differences are (1) that the exponent is signed
  (maybe we can reuse sign_bit and store the absolute value of the exponent,
  instead of storing it in 2-complement representation?) and (2) that we need
  to represent NaN and Inf (most likely in the size field).

Isn't it better to use the exp field for various special values?  At
least, that would allow us to share more code, since it would be more
similar to the mpz binary format.

It would be nice if the 'size' and 'digits' parts were the same across
GMP formats and MPFR.

  > This format could then be supported by common GMP routines, used by both
  > GMP and MPFR.

  do you plan to have the same format for both mpz and mpf (with an extra bit
  telling if the number is mpz or mpf)?

I am not very happy with such self typing (ASN.1 anybody? :-)

My thinking is that the exponent field will exist just for the float
formats, but that one may perhaps teach alternative mpz routines to read
a value with exponent and let it truncate numbers with negative
(effective) exponent to an integer.

  Also the semantics of mpf and mpfr are
  not the same (limb-based exponent vs bit-based exponent), thus even if we use
  the same format, I'm not sure reading in MPFR a number stored by GMP/mpf could
  give the same numerical value.

I was thinking that we should stick to an exponent format that suits
mpfr, something that will not be terribly expensive to handle in mpf
(except that NaNs and those fancier things will not be understood, but I
expect it to be rare to store such values to a file).  Actually
transferring data between mpf and mpfr will probably be very rare;
limiting the number of formats people will need to be bothered with is a
virtue in itself, and also we can share more code.

To make my idea clear: I think mpf_out_binary and mpfr_out_binary should
generate the exact same bits when the numbers are equal.

Does this make sense to you?  (Perhaps we will not actually implement
the mpf routines in the end, since mpf is not all that eagerly
developed.)

-- 
Torbjörn