GMP with small numbers

Sun Jun 18 12:45:03 CEST 2006

It seems to me your approach doesn't cut wood. An mpz number basically
is a struct with a pointer to an array of limbs (i.e. digits in a base
2^32 or 2^64) and a pointer to a lenght variable indicating the lenght
for the forementioned array (and the sign of the number, negative
length, indicates a negative number). so in case of numbers smaller
than 2^32, a one limb number, there is no difference between an mpz
operation and a "normal" operation apart for checking the length of
course!

Your approach involves overloading (therfor everytime an operation
gets called, there is a lookup of the appropriate function,
overloading doesn't come for free, there's no such thing as a free
lunch) and then you want to do an overflow check. That's two checks!!!
Your perceived speed gain is by now long gone!

Just a general remark. Did you have a decent look at GMP, if you did,
you would see the level of sophistication and the consideration of
every possible speed improvement (implemented or to be implemented).
I'm quite sure the issue you raise has already been extensively
considered (in version 0.1 beta), decided upon and burried.

Of course, if you really have a large block of (one limb) operations
to be performed, you could improve upon GMP, particularly using
SSE/SSE2/SSE3 instructions and proper latency scheduling, but I think
you should then try to seperate this (branch to) early on (I mean not
by overloading) in order to avoid branch mispredictions (with a
resulting high cach miss rate).

But instead of inventing the wheel yourself, why not use a linear
algebre package for those operations, BLAS/LAPACK and for AMD
Athlon64: ACML (a partial implementation of BLAS/LAPACK) or ATLAS
(self-optimizing implementation of BLAS/LAPACK). All the above is
available without payment, and easy to find using Google.