Efficiently importing/export GMP floats

Mon Jan 17 15:54:40 CET 2011

BSWAP or equivalents work on x86 and others, but not on all architectures. And it requires a load and store minimally even on x86.

I am currently processing 1min and 30sec resolution geodetic topography and bathymetry data sets. Converting the endian takes a few seconds on 1min and about 12 on 30sec, and these are 440MB and 1.8GB data sets using a single thread, as does GMP.

It's an architecture-biased form of processing expense that's entirely done away with by this convention.

- Rick

-----Original message-----
From: Joerg Arndt <arndt at jjj.de>
To: gmp-discuss at gmplib.org
Sent: Mon, Jan 17, 2011 14:25:57 GMT+00:00
Subject: Re: Efficiently importing/export GMP floats

* foxmuldrster at yahoo.com <foxmuldrster at yahoo.com> [Jan 17. 2011 15:11]:
> I do not see where network byte order is in any way desirable, as it
> necessarily mandates manipulation on any machine with an alternate
> native endian format.

Are you kidding?
Note the "overhead" is usually a single machine instruction (bswap).

> 
> Since it's most likely and often to be the case where a single
> machine is used to write / read its own data for later processing,
> why not indicate in the size byte you propose a single bit for
> endianess?

Making the overhead come back, conditionally...

> And then use the full lower 6 bits to indicate size byte
> count? You'll keep your one byte zero form, lose your one byte
> up-to-15 form, yet gain up to 63 bytes for size, AND preserve the
> native endianness of the machine.
> 
> Requiring conversions both in and out on some architectures seems
> wholly undesirable.

Undesirable as in _networking_ ?

> It should only be required whenever it happens
> to be different.

You do not know in general if it's required.

You can always patch away that single instruction,
saving you about 1 second per stored gigabyte.

> 
> It should also be a forcible setting, meaning if I know I'm
> generating data for an alternate or known endian architecture, I can
> force it to write it out that way regardless of what machine I'm on.

Hell, no!

> 
> - Rick
> 

cheers,  jj

> -----Original message-----
> From: Torbjorn Granlund <tg at gmplib.org>
> To: gmp-discuss at gmplib.org, mpfr at mpfr.org
> Sent: Mon, Jan 17, 2011 12:41:18 GMT+00:00
> Subject: Re: Efficiently importing/export GMP floats
> 
> Paul.Zimmermann at loria.fr writes:
>   
>   not yet, this is one of the things we discussed during our meeting last week.
>   We have designed a portable format, see
>   https://gforge.inria.fr/scm/viewvc.php/trunk/src/out_raw.c?view=markup&root=mpfr
>   
>   Please send comments to the MPFR list if any.
>   
> I understand that this is somewhat inspired by GMP's mpz_out_raw and
> mpz_inp_raw, albeit with some differences.
> 
> I consider mpz_out_raw/mpz_inp_raw to be somewhat obsolete due to their
> 32-bit byte count.  (Alas, they should probably not be "fixed", since
> keeping file compatibility is perhaps even more important that keeping
> binary or source compatibility; people might have things stored in files
> that might become unreadable if we update the format.)
> 
> It might therefore make sense to update to a coherent format for GMP and
> MPFR, using function names not to be confused with mpz_out_raw/mpz_inp_raw.
> 
> * The format should by byte-oriented, not 32-bit oriented.
> 
> * "Network byte order" should be used, i.e., most significant byte first.
> 
> * No reasonable size limitations should be be made by the format.
> 
> * If possible, some compatibility between mpz, mpf, and mpfr formats should
>   be created.
> 
> * Perhaps we should support printing to memory buffers as well as files.
> 
> Floats:
> | exponent | size | digits |
> 
> Integers:
> | size | digits |
> 
> 
> The size property needs to allow sizes up to the largest possible size we
> can ever store.  Making the size being a 64-bit integer always would in my
> opinion be a mistake, as it wastes space.  We should therefore have a 'size
> for the size property', supporting 1 byte, 2, byte, up to perhaps 8 byte
> sizes.  We also need to be able to represent zero, presumably by having the
> size property be zero.
> 
> The first byte of the size property should contain:
> 
> sign_bit:  1
> size_size: 3    /* how many bytes is the size property, excluding the size4 bits? */
> size4:     4    /* most significant 4 bits of the size property */
> 
> For 0, all 8 bits should be zero (except for -0, where the sign bit will be
> set).
> 
> If size_size = 0, the size property is encoded just in the 4 bits of
> 'size4', allowing the 'digits' part to be up to 15 bytes.
> 
> If size_size 1...7, the size property will be encoded as 8+4=12 to 7*8+4 =
> 60 bits.  (And remember that these refer to # of bytes in the 'digits'
> part.)
> 
> This is a compact format for small "bignums", as such numbers are the most
> common ones.  It should also be easy and fast to read/write.
> 
> For float formats, the exponent property should be encoded in the same
> spirit as the size property.  Its interpretation should be analogous to
> IEEE-754, i.e., allow NaNs and Infs.  It should be base 2 wrt the mantissa,
> of course.  (I know people off and on request a bignum exponent, which this
> format will not allow.  If it is really to be taken as serious propositions
> with practical use, then perhaps one should allow for a more variable size
> exponent.)
> 
> This format could then be supported by common GMP routines, used by both
> GMP and MPFR.
> 
> -- 
> Torbjörn
> _______________________________________________
> gmp-discuss mailing list
> gmp-discuss at gmplib.org
> https://gmplib.org/mailman/listinfo/gmp-discuss
> _______________________________________________
> gmp-discuss mailing list
> gmp-discuss at gmplib.org
> https://gmplib.org/mailman/listinfo/gmp-discuss
_______________________________________________
gmp-discuss mailing list
gmp-discuss at gmplib.org
https://gmplib.org/mailman/listinfo/gmp-discuss