Has there been historical work done to investigate small integer optimization?

Sun Feb 11 17:32:34 CET 2024

Stefan Koch <stefan.koch.micro at gmail.com> writes:

> Anyway, I know you guys take performance very seriously, so I was wondering
> if you had experience with this?
> 1. Is there real potential for performance improvements in cas systems if
> gmp has a small integer optimization?

For usecases where almost all integers are small, I would expect a
small-integer optimization could pay off. For historic examples, look at
lisp systems (or any other languages where there's a single integer type
that can grow as large as needed, internally represented as either a
"fixnum" or "bignum" depending on size). Sparc processors (for 32-bit)
even had some special instructions that could in theory make the
implementation more efficient.

> 2. Have you looked at an approach like this in the past?  and if so what
> have you found?

I'm not aware of any substantial work along these lines for GMP.

> 3. If you think this is something worth pursuing, do you have any advice?

I doubt it is useful for the main mpz_t type in GMP. For numbers that
are large enough, say 100 limbs or more, the runtime overhead of an
extra branch on each operation to make the fixnum/bignum distinction is
likely not that costly. But performance for numbers of just a few limbs
(e.g., sizes relevant to public key cryptography) is also quite
important, and for those sizes I suspect the cost of the extra check
will be significant. In the context of GMP development, "small number
optimizations" usually means optimizations for for numbers of size up to
5-10 limbs or so.

So if you want to pursue a fixnum/bignum small-number optimization, I'd
suggest doing it as a separate library interface, defining a type
intended to represent numbers that are usually small, but uses gmp when
numbers grow too large for the fixnum representation. One classic way to
implement it would be to use one or two of the low bits of a computer
word as tag bits. If tag is zero, the rest of the bits represents a
signed fixnum. If tag is non-zero, you get a pointer to a heap-allocated
bignum by clearing the tag bits (which is fine, because pointers usually
require some alignment anyway).

To reason about potential gains, you'd need to keep in mind the pretty
high cost of a mis-predicted branch, compared to the cost of a memory
access, for relevant assumptions on cachedness.

Regards,
/Niels

-- 
Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677.
Internet email is subject to wholesale government surveillance.