Efficiently importing/export GMP floats
foxmuldrster at yahoo.com
foxmuldrster at yahoo.com
Mon Jan 17 15:54:40 CET 2011
BSWAP or equivalents work on x86 and others, but not on all architectures. And it requires a load and store minimally even on x86.
I am currently processing 1min and 30sec resolution geodetic topography and bathymetry data sets. Converting the endian takes a few seconds on 1min and about 12 on 30sec, and these are 440MB and 1.8GB data sets using a single thread, as does GMP.
It's an architecture-biased form of processing expense that's entirely done away with by this convention.
- Rick
-----Original message-----
From: Joerg Arndt <arndt at jjj.de>
To: gmp-discuss at gmplib.org
Sent: Mon, Jan 17, 2011 14:25:57 GMT+00:00
Subject: Re: Efficiently importing/export GMP floats
* foxmuldrster at yahoo.com <foxmuldrster at yahoo.com> [Jan 17. 2011 15:11]:
> I do not see where network byte order is in any way desirable, as it
> necessarily mandates manipulation on any machine with an alternate
> native endian format.
Are you kidding?
Note the "overhead" is usually a single machine instruction (bswap).
>
> Since it's most likely and often to be the case where a single
> machine is used to write / read its own data for later processing,
> why not indicate in the size byte you propose a single bit for
> endianess?
Making the overhead come back, conditionally...
> And then use the full lower 6 bits to indicate size byte
> count? You'll keep your one byte zero form, lose your one byte
> up-to-15 form, yet gain up to 63 bytes for size, AND preserve the
> native endianness of the machine.
>
> Requiring conversions both in and out on some architectures seems
> wholly undesirable.
Undesirable as in _networking_ ?
> It should only be required whenever it happens
> to be different.
You do not know in general if it's required.
You can always patch away that single instruction,
saving you about 1 second per stored gigabyte.
>
> It should also be a forcible setting, meaning if I know I'm
> generating data for an alternate or known endian architecture, I can
> force it to write it out that way regardless of what machine I'm on.
Hell, no!
>
> - Rick
>
cheers, jj
> -----Original message-----
> From: Torbjorn Granlund <tg at gmplib.org>
> To: gmp-discuss at gmplib.org, mpfr at mpfr.org
> Sent: Mon, Jan 17, 2011 12:41:18 GMT+00:00
> Subject: Re: Efficiently importing/export GMP floats
>
> Paul.Zimmermann at loria.fr writes:
>
> not yet, this is one of the things we discussed during our meeting last week.
> We have designed a portable format, see
> https://gforge.inria.fr/scm/viewvc.php/trunk/src/out_raw.c?view=markup&root=mpfr
>
> Please send comments to the MPFR list if any.
>
> I understand that this is somewhat inspired by GMP's mpz_out_raw and
> mpz_inp_raw, albeit with some differences.
>
> I consider mpz_out_raw/mpz_inp_raw to be somewhat obsolete due to their
> 32-bit byte count. (Alas, they should probably not be "fixed", since
> keeping file compatibility is perhaps even more important that keeping
> binary or source compatibility; people might have things stored in files
> that might become unreadable if we update the format.)
>
> It might therefore make sense to update to a coherent format for GMP and
> MPFR, using function names not to be confused with mpz_out_raw/mpz_inp_raw.
>
> * The format should by byte-oriented, not 32-bit oriented.
>
> * "Network byte order" should be used, i.e., most significant byte first.
>
> * No reasonable size limitations should be be made by the format.
>
> * If possible, some compatibility between mpz, mpf, and mpfr formats should
> be created.
>
> * Perhaps we should support printing to memory buffers as well as files.
>
> Floats:
> | exponent | size | digits |
>
> Integers:
> | size | digits |
>
>
> The size property needs to allow sizes up to the largest possible size we
> can ever store. Making the size being a 64-bit integer always would in my
> opinion be a mistake, as it wastes space. We should therefore have a 'size
> for the size property', supporting 1 byte, 2, byte, up to perhaps 8 byte
> sizes. We also need to be able to represent zero, presumably by having the
> size property be zero.
>
> The first byte of the size property should contain:
>
> sign_bit: 1
> size_size: 3 /* how many bytes is the size property, excluding the size4 bits? */
> size4: 4 /* most significant 4 bits of the size property */
>
> For 0, all 8 bits should be zero (except for -0, where the sign bit will be
> set).
>
> If size_size = 0, the size property is encoded just in the 4 bits of
> 'size4', allowing the 'digits' part to be up to 15 bytes.
>
> If size_size 1...7, the size property will be encoded as 8+4=12 to 7*8+4 =
> 60 bits. (And remember that these refer to # of bytes in the 'digits'
> part.)
>
> This is a compact format for small "bignums", as such numbers are the most
> common ones. It should also be easy and fast to read/write.
>
> For float formats, the exponent property should be encoded in the same
> spirit as the size property. Its interpretation should be analogous to
> IEEE-754, i.e., allow NaNs and Infs. It should be base 2 wrt the mantissa,
> of course. (I know people off and on request a bignum exponent, which this
> format will not allow. If it is really to be taken as serious propositions
> with practical use, then perhaps one should allow for a more variable size
> exponent.)
>
> This format could then be supported by common GMP routines, used by both
> GMP and MPFR.
>
> --
> Torbjörn
> _______________________________________________
> gmp-discuss mailing list
> gmp-discuss at gmplib.org
> https://gmplib.org/mailman/listinfo/gmp-discuss
> _______________________________________________
> gmp-discuss mailing list
> gmp-discuss at gmplib.org
> https://gmplib.org/mailman/listinfo/gmp-discuss
_______________________________________________
gmp-discuss mailing list
gmp-discuss at gmplib.org
https://gmplib.org/mailman/listinfo/gmp-discuss
More information about the gmp-discuss
mailing list