GMP with small numbers

Sun Jun 18 12:59:30 CEST 2006

On 18/06/06, Degski <degski at gmail.com> wrote:
> It seems to me your approach doesn't cut wood. An mpz number basically
> is a struct with a pointer to an array of limbs (i.e. digits in a base
> 2^32 or 2^64) and a pointer to a lenght variable indicating the lenght
> for the forementioned array (and the sign of the number, negative
> length, indicates a negative number). so in case of numbers smaller
> than 2^32, a one limb number, there is no difference between an mpz
> operation and a "normal" operation apart for checking the length of
> course!
>

OK, thanks a lot for the information on this.

> Your approach involves overloading (therfor everytime an operation
> gets called, there is a lookup of the appropriate function,
> overloading doesn't come for free, there's no such thing as a free
> lunch) and then you want to do an overflow check. That's two checks!!!
> Your perceived speed gain is by now long gone!
>

Yes, It seems you're right. Well, less work for me... just need to
learn GMP, then! :-)

By the way, since I'm programming in C++ I'm considering to use GMP
C++ interface. Is there any kind of efficiency issue by using the
interface instead of the C functions directly, i.e., is it known to be
slower?

> Just a general remark. Did you have a decent look at GMP, if you did,
> you would see the level of sophistication and the consideration of
> every possible speed improvement (implemented or to be implemented).
> I'm quite sure the issue you raise has already been extensively
> considered (in version 0.1 beta), decided upon and burried.
>

Nope, surely didn't. I might imagine there's a lot of work and effort
in it but I just didn't check it! :-)

> Of course, if you really have a large block of (one limb) operations
> to be performed, you could improve upon GMP, particularly using
> SSE/SSE2/SSE3 instructions and proper latency scheduling, but I think
> you should then try to seperate this (branch to) early on (I mean not
> by overloading) in order to avoid branch mispredictions (with a
> resulting high cach miss rate).
>
> But instead of inventing the wheel yourself, why not use a linear
> algebre package for those operations, BLAS/LAPACK and for AMD
> Athlon64: ACML (a partial implementation of BLAS/LAPACK) or ATLAS
> (self-optimizing implementation of BLAS/LAPACK). All the above is
> available without payment, and easy to find using Google.
>

Thanks, I'll look into this. Thanks for your extensive reply and
references. I'll look into them.

Cheers,
-- 
Paulo Jorge Matos - pocm at sat inesc-id pt
Web: http://sat.inesc-id.pt/~pocm
Computer and Software Engineering
INESC-ID - SAT Group