Has there been historical work done to investigate small integer optimization?

Sun Feb 11 17:20:48 CET 2024

On Sun, 11 Feb 2024, Stefan Koch wrote:

> There is a fair amount of recent activity in the c++ world to optimize for
> speed by minimizing processor cache misses.  One example is for in the
> std::string class there is a concept of "short string optimization".  What
> this does is that for shorter strings the data is stored in the string
> class itself without need for dynamic memory allocation.   This has two
> advantages: it does not need to make a call to dynamic memory allocation,
> and the string data is in the same location as the meta data, and so only
> in one cache fetch.  Both of which are potentially slow.
>
> When I was looking at the implementation the _mp_d member of __mpz_struct
> is 8 bytes, which is also the size of a single limb entry.  (I suspect that
> this is common on recent systems, but it is probably not be universal.)  My
> thought was that if _mp_alloc is one, then we have one limb needed.  In
> that case, the space of _mp_d could serve as the limb itself instead of the
> pointer to the limb.  That would eliminate dynamic memory allocation for
> all integers under 2^64.
>
> I was coming at this from a computer algebra system standpoint.  I think
> (but I really don't know), that much of the arithmetic that is done is done
> with integers under 2^64, and so those systems may be much faster if gmp
> had a small integer optimization.
>
> This approach will add a check when accessing the limb data for all
> accesses but will save accessing potentially non cpu cache data for small
> integers.  It may turn out that the small delay is not so noticeable for
> large numbers, and the optimization for small numbers makes a large
> difference for small numbers.
>
> A second thought is to do the arithmetic if the sources are and result will
> fit in a single limb with inline functions or macros. That would eliminate
> the function call entirely for small numbers.  It would make the code
> longer, and add additional overhead for when the integers are not short,
> but is it not clear how much of an impact that is for lager numbers.
>
> Anyway, I know you guys take performance very seriously, so I was wondering
> if you had experience with this?
> 1. Is there real potential for performance improvements in cas systems if
> gmp has a small integer optimization?
> 2. Have you looked at an approach like this in the past?  and if so what
> have you found?
> 3. If you think this is something worth pursuing, do you have any advice?

Several software already wrap GMP using this kind of strategy: a "number" is 
either a 63-bit integer or a pointer to a GMP big integer (this is 
determined by one bit), operations on 63-bit integers are inline.

What is efficient depends on your use case. If you want to use the same 
integer type everywhere, you should definitely optimize for the small 
case. If you use separate types for small vs big, and big integers are 
always big, then you should optimize for the big case.

There is also an intermediate size where numbers occupy just a few limbs, 
where allocation can still dominate computation, and a small-string-like 
optimization can help significantly (see Boost.Multiprecision's cpp_int). 
Again, a one-size-fits-all seems optimistic.

-- 
Marc Glisse