GMP and 64-bit systems

Torbjorn Granlund tg at swox.com
Sun Jun 1 11:13:21 CEST 2008


<librik at panix.com> writes:

  The main issue people have with GMP on Win64 is really a more general
  problem of non-standard C coding practice.  In fact, this practice can
  be cleaned up systematically, which will improve the quality of GMP's
  code and thereby make it work better with a wider variety of 64-bit
  interfaces.
  
  The problem is:  GMP's code implicitly assumes the LP64 model of
  64-bit C types.  This is not the only 64-bit model; LLP64 and ILP64
  are alternatives.

It supports more than LP64.  It supports for example ILP32 with 64-bit
limbs (e.g. MIPS n32) and ILP64 (e.g. Cray).
  
  The solution is:  Use the existing "mp_size_t" type consistently
  throughout the GMP source code and public interfaces to refer to a
  number of limbs, bytes, or bits.

That wouldn't work.  For 32-bit machines, bits need to use a 32-bit
unsigned type.  Also, public interfaces' types should not change.

  Then typedef "mp_size_t" to a 32-bit
  or 64-bit fundamental C type.  This is already what's done with the
  "mp_limb_t" type and the "LONG_LONG_LIMB" & "GMP_SHORT_LIMB"
  preprocessor macros.
  
  mp_size_t is already used in many places in the GMP code for this
  purpose.  But it's not everywhere yet -- many functions and structs
  still use a bare "long int".  Until all the longs are eliminated,
  the code is not fully portable.
  
I don't think it is a question about "not yet".  If long int is used
someplace, it is usually because mp_size_t was not considered to be
conceptually right there.
  
  What's harder to fix is that mp_size_t isn't used everywhere it should.
  Most GMP functions, including the public API, still assume that they
  can use "long" where they mean "mp_size_t".  Also, mp_size_t (or maybe
  just size_t) needs to be used when there's a count of bits or bytes.
  
The proper type for bytes is surely size_t.  I believe that's what GMP
uses.  mp_size_t is for counts of limbs, we don't use size_t here for
the little benefit of that it is a signed type, simplifying loops down
to 0.

  This isn't the only 64-bit limitation in GMP.  As Torbjorn has pointed
  out, the _mp_size and _mp_alloc fields of the mpz_struct are currently
  ints and not mp_size_t's.  Therefore, no mpz integer can be larger than
  2^31 limbs, even on 64-bit computers.  But making that change really
  would break backward binary compatibility.  Still, I believe it should
  be done!

Something needs to be done about it.

One solution would be borrowing bits from _mp_alloc to _mp_size, using
a more crude allocation for larger sizes.  Drawbacks:

(1) We have inlined references to _mp_size into user binaries, such
binaries will be limited to 2G limbs.  (No GMP version has created
inlines of _mp_alloc.)

(2) Some added overhead, but if coded carefully it should be tiny.

On the other hand, using mp_size_t for these fields would completely
break binary compatibility and make the structures considerably
larger.  Some people have large vectors of mpz_t where the element are
actually quite small ("small bignums").

We should assume a pragmatic approach to the type improvements in GMP.

The primary thing for GMP 5 in this respect is allowing much greater
operands in mpz_t (and perhaps higher precision in mpf_t) on 64-bit
systems.

Supporting Windoze's 64-bit ABI is something we might want to do, but
it is not a central goal of the GNU project.  We should not break
binary compatibility for its sake.  In particular, "unsigned long" bit
counts in user visible function should not change.

If we decide to supports Windoze's 64-bit ABI, we do not need to worry
about actual 64-bit limb counts internally, since the limitations of
mpz_t would make that almost pointless.  (We already support larger
numbers at the mpn level for existing systems, but this has not been
well-tested and is probably used very little.)

For GMP 5, if we decide to extend mpz_t's range (which we should), we
will need to add alternative interfaces for some functions, if we
decide greater bit counts are actually needed at that level.

There is one tricky thing about this, and it is whether to assume
"long long" exixts.  When I started working on GMP, it certainly
wasn't ubiquitous.  Unfortunately, I don't think we can assume it is
now, since it was not specified until C99.

We want there to be one GMP ABI, so we cannot define a user visible
function with "long long" unless we decide to de-support C89
environments.

Furthermore, we cannot use "long long" on systems just because one
compiler on that system supports it.  If we put long long in gmp.h on
such a system, people using the other compiler will get surprises.

  Finally, I need to apologize to Torbjorn and other people on this list.
  He and I discussed the need for an mp_size_t rewrite many months ago.
  I promised to work on it.  And then I got busy and overworked, and went
  radio silent, and never followed through.  I had hoped that all these
  backward-incompatible changes could wait for GMP 5.0, but it seems as
  though a crisis has come to a head.  If I had done the work earlier,
  perhaps the rancor of the last few days might have been averted.  I
  dropped the ball, and this is my fault.  I hope that whatever happens,
  the mp_size_t cleanup will proceed, which would answer most Win64
  people's objections.
  
Be careful, you might become the new target for the "SAGE rage".  Do
you deny the rumours about that your inaction on LLP64 is related to
the fact that you failed to extort a large sum of money from somebody
who offered to help?  :-)

Seriously though: I don't think we should allow the SAGE rage nor
their out-of-respect-for-Microsoft fork to disturb the GMP project.
We should keep carefully designing GMP's internal and external
interfaces.  I think I'll start using the motto: Better late than
sorry.  :-)

-- 
Torbjörn


More information about the gmp-discuss mailing list