GMP and 64-bit systems

Sun Jun 1 11:30:17 CEST 2008

Hello,

* librik at panix.com <librik at panix.com> [Jun 01. 2008 18:25]:
> Hi there.  I'd like to make some comments spurred by the recent
> complaints about GMP's compatibility with 64-bit Windows.
> 
> 
> The main issue people have with GMP on Win64 is really a more general
> problem of non-standard C coding practice.  In fact, this practice can
> be cleaned up systematically, which will improve the quality of GMP's
> code and thereby make it work better with a wider variety of 64-bit
> interfaces.
> 
> The problem is:  GMP's code implicitly assumes the LP64 model of
> 64-bit C types.  This is not the only 64-bit model; LLP64 and ILP64
> are alternatives.

I'd think that LP64 is used in very many software projects with
the notable exception of software that is exclusively or mainly
developed on/for Windows.  Can someone please comment on this?

What 64-bit archs (of any importance) are not using LP64?

> 
> The solution is:  Use the existing "mp_size_t" type consistently
> throughout the GMP source code and public interfaces to refer to a
> number of limbs, bytes, or bits.  Then typedef "mp_size_t" to a 32-bit
> or 64-bit fundamental C type.  This is already what's done with the
> "mp_limb_t" type and the "LONG_LONG_LIMB" & "GMP_SHORT_LIMB"
> preprocessor macros.
> 
> mp_size_t is already used in many places in the GMP code for this
> purpose.  But it's not everywhere yet -- many functions and structs
> still use a bare "long int".  Until all the longs are eliminated,
> the code is not fully portable.

Making software portable is a good thing indeed, fully agreed.
Making it really really fully portable
-- causes significant extra work
   (note a char doesn't even need to be eight bits!)
-- makes it vastly more complex and can render testing
   a real challenge.
-- may lead to waste of performance or features.

> 
> 
> What's the issue with 64-bit C types?  Here's a quick background document
> for you, the "Aspen paper" describing the three models and why most Unix
> systems standardized on LP64.
>     http://www.unix.org/version2/whatsnew/lp64_wp.html
> A summary for the impatient:  there are three ways to extend C types to
> a world where pointers and size_t's are 64 bits wide.
> * LP64 defines "int" as 32 bits and "long" as 64 bits.
> * LLP64 defines "int" and "long" as 32 bits, and an additional type
>   (usually called "long long") as 64 bits.
> * ILP64 defines "int" and "long" as 64 bits.
> ALL THREE ARE EQUALLY VALID APPROACHES.

I dare to disagree.  Assuming that the type long is a machine word,
and, when used as an array index, allows to index all memory that is
addressable, is what I call a sane model.

Making long==int _and_ smaller than 64 (on a 64-bit arch) seems
to be a concession so portability to very suboptimal code.

> 
> The Aspen document looks at existing Unix source code and the parameters
> of POSIX standard function calls, and decides that LP64 is the best choice
> for portability based on these specific constraints.
> 
> Similarly, the people who extended Microsoft Windows to 64 bits looked
> at their existing code base and the parameters of Windows API function
> calls, and chose LLP64 as the best model in that case.
> 
> (Old Crays were ILP64 systems, but, in my experience, the main use of
> that model now is in 64-bit extensions of Fortran libraries.)
> 
> 
> Since I spend a lot of my work time cleaning up people's less-than-
> portable 64-bit source code, please allow me a short moment of stupid,
> unfair, irrational ranting:
> ** If you use the "long" type as an integer guaranteed to hold a pointer
>    or a memory size, you are a BAD C PROGRAMMER.  I don't care if it's
>    what you're used to.  It's NOT CORRECT.  Please stop! **

Casting pointers to and from integer types is bad in the first place.
Still, a model where long cannot hold a full address, is IMHO not sane.

> 
> The only acceptable use of "long" is when you need a variable that's
> guaranteed to be longer than a "short", or when you have to talk to
> an operating system API function.  Otherwise it is best to avoid it,
> because "it does not mean what you think it means."
> 

With a sane model long==generic-machine-word.

> 
> A conclusion:  any integer type intended to be 32 bits on 32-bit
> systems and 64 bits on 64-bit systems cannot be a basic C type.  It
> must be a typedef type, whose identity is controlled by an #ifdef in
> some header file.

Yes that's the price for the full adherence to the standard.  I
suggest sticking to LP64 (and saying so in the doc!), and, when it's
not there, bailing out with an error (or warning).
Or falling back to a safe but potentially slow code branch.

> 
> [...]
> 
> This isn't the only 64-bit limitation in GMP.  As Torbjorn has pointed
> out, the _mp_size and _mp_alloc fields of the mpz_struct are currently
> ints and not mp_size_t's.  Therefore, no mpz integer can be larger than
> 2^31 limbs, even on 64-bit computers.  But making that change really
> would break backward binary compatibility.  Still, I believe it should
> be done!
> 

Definitely.

> 
> Finally, I need to apologize to Torbjorn and other people on this list.
> He and I discussed the need for an mp_size_t rewrite many months ago.
> I promised to work on it.  And then I got busy and overworked, and went
> radio silent, and never followed through.  I had hoped that all these
> backward-incompatible changes could wait for GMP 5.0, but it seems as
> though a crisis has come to a head.  If I had done the work earlier,
> perhaps the rancor of the last few days might have been averted.  I
> dropped the ball, and this is my fault.  I hope that whatever happens,
> the mp_size_t cleanup will proceed, which would answer most Win64
> people's objections.
> 
> 
> - David Librik
> librik at panix.com
> _______________________________________________
> gmp-discuss mailing list
> gmp-discuss at swox.com
> https://gmplib.org/mailman/listinfo/gmp-discuss

cheers,   jj

P.S.: I was quite disappointed that the type long long was not used
for "two machine words" but stayed at 64 bit, this also reeks like a
concession to questionable code.