How large are arbitrarily large integers? (Was Re: Bitfield subtlety)

Thu Mar 29 16:21:02 UTC 2018

nisse at lysator.liu.se (Niels Möller) writes:

  I think one relevant question is on how large numbers we expect to be
  able to complete any computation with worse complexity than O(n) in
  reasonable time. But when we integrate fft multiplication with decent
  locality, the limit will be cpu power, not available RAM, I'd hope.

A plain multiply will take a good while already for 2^40-bit operands.

  To me, 0.05% allocation overhead sounds reasonable. Is there any
  other good reason to steal less than 16 bits?

Yes.  The mantissa size determines which is the smallest operand for
which we need to add some bookkeeping overhead for doing the float
conversion.  These small sizes will just store the exact integral
allocation.

I considered several possibilities of the form m*b^e, where m is the
mantissa and e is the exponent, both stored in an k bit allocation
field, and b is implicit the base which I limited to either 2, 4, 8, or
16.  Only m and e sizes and b bases which result in similar allocation
range and _mp_size range are of course worth considering.

For some reason, I also limited the choices so that either 8 or 16 bits
are added to the _mp_size field.  That might give a small speedup in
some cases, perhaps, but the result is that only one balanced
possibility  remain:

b = 2
manstissa bits = 10
exponent bits = 6

Let's abbreviate this as (2,10,6).

This gives an allocation range of almost 2^73 limbs, which is of course
silly, but (2,11,5) would result in just the range 2^42 limbs.

Perhaps something like (2,14,5) would make most sense.  This would give
allocation range of 2^45 limbs and size range of 2^44 limbs, while
invoking the hairy code for allocations above 8191 limbs.

Assuming we use a bitfield (which we cannot, but this is demo code),
reading the allocation would look something like this:

size_t
readalloc (mpz_struct* r)
{
  return (size_t) (r->_mp_alloc & MANTMASK) << ((r->_mp_alloc >> MANTBITS) * LOGBASE);
}

While writing would look e.g. like thisL

void
writealloc (mpz_struct* r, size_t s)
{
  if (s <= MANTMASK)
    {
      r->_mp_alloc = s;
    }
  else
    {
      // could use count_leading_zeros here, if it's efficient
      // now, looping here is not terrible as it adds log(n) overhead,
      // i.e., less overhead for smaller operands.
      size_t e = 0;
      do
	{
	  s = 1 + ((s - 1) >> LOGBASE);
	  e += 1;  // could add 1<<MANTBITS here to avoid shift below
	}
      while (s > MANTMASK);
      r->_mp_alloc = s + (e << MANTBITS);
    }
}

The structure used is this one:

typedef struct {
  mp_limb_t* _mp_d;
  int64_t  _mp_size  : 40;
  uint64_t _mp_alloc : MANTBITS+EXPBITS;
} mpz_struct;

Checking if reallocation is needed will add minuscule overhead:

  if (needed > (x->_mp_alloc & MANTMASK) && needed > readalloc(x))
    realloc(...)

-- 
Torbjörn
Please encrypt, key id 0xC8601622