Has there been historical work done to investigate small integer optimization?

Stefan Koch stefan.koch.micro at gmail.com
Sun Feb 11 16:30:40 CET 2024

There is a fair amount of recent activity in the c++ world to optimize for
speed by minimizing processor cache misses.  One example is for in the
std::string class there is a concept of "short string optimization".  What
this does is that for shorter strings the data is stored in the string
class itself without need for dynamic memory allocation.   This has two
advantages: it does not need to make a call to dynamic memory allocation,
and the string data is in the same location as the meta data, and so only
in one cache fetch.  Both of which are potentially slow.

When I was looking at the implementation the _mp_d member of __mpz_struct
is 8 bytes, which is also the size of a single limb entry.  (I suspect that
this is common on recent systems, but it is probably not be universal.)  My
thought was that if _mp_alloc is one, then we have one limb needed.  In
that case, the space of _mp_d could serve as the limb itself instead of the
pointer to the limb.  That would eliminate dynamic memory allocation for
all integers under 2^64.

I was coming at this from a computer algebra system standpoint.  I think
(but I really don't know), that much of the arithmetic that is done is done
with integers under 2^64, and so those systems may be much faster if gmp
had a small integer optimization.

This approach will add a check when accessing the limb data for all
accesses but will save accessing potentially non cpu cache data for small
integers.  It may turn out that the small delay is not so noticeable for
large numbers, and the optimization for small numbers makes a large
difference for small numbers.

A second thought is to do the arithmetic if the sources are and result will
fit in a single limb with inline functions or macros. That would eliminate
the function call entirely for small numbers.  It would make the code
longer, and add additional overhead for when the integers are not short,
but is it not clear how much of an impact that is for lager numbers.

Anyway, I know you guys take performance very seriously, so I was wondering
if you had experience with this?
1. Is there real potential for performance improvements in cas systems if
gmp has a small integer optimization?
2. Have you looked at an approach like this in the past?  and if so what
have you found?
3. If you think this is something worth pursuing, do you have any advice?


More information about the gmp-devel mailing list