Force # of data bytes in raw integer format, and omit size information?

Daniel Goldman dgoldman at ehdp.com
Sun Feb 22 08:20:09 CET 2009


I'm writing large numbers of raw GMP integers to index files.
The numbers currently range from 0 to about 10^25.

I started out using mpz_out_raw. If a number has less than the
maximum # of data bytes, after writing the number, I pad (with
empty bytes) to give a constant record size. That lets me do
binary searches on the index files (fseek, mpz_inp_raw).

I'd like to know the best way to write all the numbers with the
same number of data bytes (based on size of largest number), and
omit the redundant size information (same for each number). That
would save a huge 4 bytes per number. Of course, I would also need
to read the data bytes back into an mpz_t number.

mpz_out_raw seems to store the raw data in the smallest amount of
space, not the amount of space in the mpz_t variable. Also, it
always writes the size information.

I got to thinking mpz_export would do what I want, but I had a hard
time understanding it. I've carefully studied all the integer functions.
mpz_import and mpz_export seem possibly the hardest to understand.

So I was ready to ask if someone could provide an example of using
mpz_export (or other method) to write out a constant number of data
bytes, for both small and large integers, and not include the 4 bytes
of size information.

Scanning through the discussion group, I found a related post by
Ken Smith ("base 256", 12/13/06). Ken provided some promising code:

1: const char* line = "1a2b3c90effe487db13f45c9872b309e47234a";
2:
3: mpz_t line_as_num;
4: mpz_init_set_str(line_as_num, line, 16);
5:
6: size_t line_as_bytes_len = 0;
7: void* line_as_bytes =
8:    mpz_export(NULL, &line_as_bytes_len, 1, 1, 1, 0, line_as_num);
...
n-1: free(line_as_bytes)
n: mpz_clear(line_as_num);

Modifying his example, I came up with the following, which seems
to work fine. I haven't done any kind of speed tests. But I imagine
it's quite fast. The speed only matters for the read part.

// Write same number of data bytes for mpzNum, no size information

#define DATA_BYTES_IN_RECORD 16

mpz_export (unsignedCharBfr, &bytesWritten, 1, 1, 1, 0, mpzNum);
bytesToPad = DATA_BYTES_IN_RECORD - bytesWritten;
for (byteNdx = 0; byteNdx < bytesToPad; byteNdx++) { // padding
   fputc (0, fp);
   }
fwrite (unsignedCharBfr, 1, bytesWritten, fp); // data bytes

// Read data bytes back into mpzNum

fread (unsignedCharBfr, 1, DATA_BYTES_IN_RECORD, fp);
mpz_import (mpzNum, DATA_BYTES_IN_RECORD, 1, 1, 1, 0, unsignedCharBfr);

Any better way to do this? Any pitfalls with this method from
those with more experience?

Thanks,
Daniel Goldman

PS - Exactly what is a "nail"?


More information about the gmp-discuss mailing list