GMP and 64-bit systems
librik at panix.com
librik at panix.com
Sun Jun 1 10:03:20 CEST 2008
Hi there. I'd like to make some comments spurred by the recent
complaints about GMP's compatibility with 64-bit Windows.
The main issue people have with GMP on Win64 is really a more general
problem of non-standard C coding practice. In fact, this practice can
be cleaned up systematically, which will improve the quality of GMP's
code and thereby make it work better with a wider variety of 64-bit
interfaces.
The problem is: GMP's code implicitly assumes the LP64 model of
64-bit C types. This is not the only 64-bit model; LLP64 and ILP64
are alternatives.
The solution is: Use the existing "mp_size_t" type consistently
throughout the GMP source code and public interfaces to refer to a
number of limbs, bytes, or bits. Then typedef "mp_size_t" to a 32-bit
or 64-bit fundamental C type. This is already what's done with the
"mp_limb_t" type and the "LONG_LONG_LIMB" & "GMP_SHORT_LIMB"
preprocessor macros.
mp_size_t is already used in many places in the GMP code for this
purpose. But it's not everywhere yet -- many functions and structs
still use a bare "long int". Until all the longs are eliminated,
the code is not fully portable.
What's the issue with 64-bit C types? Here's a quick background document
for you, the "Aspen paper" describing the three models and why most Unix
systems standardized on LP64.
http://www.unix.org/version2/whatsnew/lp64_wp.html
A summary for the impatient: there are three ways to extend C types to
a world where pointers and size_t's are 64 bits wide.
* LP64 defines "int" as 32 bits and "long" as 64 bits.
* LLP64 defines "int" and "long" as 32 bits, and an additional type
(usually called "long long") as 64 bits.
* ILP64 defines "int" and "long" as 64 bits.
ALL THREE ARE EQUALLY VALID APPROACHES.
The Aspen document looks at existing Unix source code and the parameters
of POSIX standard function calls, and decides that LP64 is the best choice
for portability based on these specific constraints.
Similarly, the people who extended Microsoft Windows to 64 bits looked
at their existing code base and the parameters of Windows API function
calls, and chose LLP64 as the best model in that case.
(Old Crays were ILP64 systems, but, in my experience, the main use of
that model now is in 64-bit extensions of Fortran libraries.)
Since I spend a lot of my work time cleaning up people's less-than-
portable 64-bit source code, please allow me a short moment of stupid,
unfair, irrational ranting:
** If you use the "long" type as an integer guaranteed to hold a pointer
or a memory size, you are a BAD C PROGRAMMER. I don't care if it's
what you're used to. It's NOT CORRECT. Please stop! **
The only acceptable use of "long" is when you need a variable that's
guaranteed to be longer than a "short", or when you have to talk to
an operating system API function. Otherwise it is best to avoid it,
because "it does not mean what you think it means."
A conclusion: any integer type intended to be 32 bits on 32-bit
systems and 64 bits on 64-bit systems cannot be a basic C type. It
must be a typedef type, whose identity is controlled by an #ifdef in
some header file.
Often "size_t" is an (unsigned) example of such a type.
In the latest revision of the C standard (not widely adopted, alas),
there is a typedef type which is guaranteed to scale with the pointer
size; it's called "intptr_t".
But most of the time it is simply better to write it yourself.
That gives the user or the autoconf system more control over the
library's ABI.
GMP 4.2 already has a parameterized typedef type which controls the
length of a basic limb. It's called mp_limb_t. It can be set to
int, long, or long long by the use of preprocessor macros.
GMP also has such a type which is supposed to represent the number of
limbs in an mpn sequence. It's called mp_size_t. It is not yet
settable to int, long, or long long by preprocessor macros, but that's
not hard to fix.
What's harder to fix is that mp_size_t isn't used everywhere it should.
Most GMP functions, including the public API, still assume that they
can use "long" where they mean "mp_size_t". Also, mp_size_t (or maybe
just size_t) needs to be used when there's a count of bits or bytes.
Luckily, all existing GMP binaries (except for Win64) have mp_size_t ==
long. So if the word "long" is replaced by "mp_size_t" in a function's
parameters or return value, the new GMP library will be binary compatible
with the old one.
This isn't the only 64-bit limitation in GMP. As Torbjorn has pointed
out, the _mp_size and _mp_alloc fields of the mpz_struct are currently
ints and not mp_size_t's. Therefore, no mpz integer can be larger than
2^31 limbs, even on 64-bit computers. But making that change really
would break backward binary compatibility. Still, I believe it should
be done!
Finally, I need to apologize to Torbjorn and other people on this list.
He and I discussed the need for an mp_size_t rewrite many months ago.
I promised to work on it. And then I got busy and overworked, and went
radio silent, and never followed through. I had hoped that all these
backward-incompatible changes could wait for GMP 5.0, but it seems as
though a crisis has come to a head. If I had done the work earlier,
perhaps the rancor of the last few days might have been averted. I
dropped the ball, and this is my fault. I hope that whatever happens,
the mp_size_t cleanup will proceed, which would answer most Win64
people's objections.
- David Librik
librik at panix.com
More information about the gmp-discuss
mailing list