GCL, GMP linkage

Sat Dec 20 08:18:28 CET 2003

Camm Maguire <camm at enhanced.com> writes:
>
>> Camm Maguire wrote:
>>
>> >One thing I've been mulling over regarding GCL built binaries is an
>> >automatic selection of the fastest gmp routines possible for the
>> >general architecture at runtime, i.e. sse1, sse2, etc.

The next gmp will have a "fat binary" scheme of runtime selection in
certain low level routines.  We hope it will do some good for a
generic i386 build, ie. when someone doesn't want to make a fully
optimized cpu-specific build.

> I agree that the case of many modest sized numbers is all that really
> matters.   I basically just looked at Vadim's latest maxima benchmark,
> ratsimp((x+y+z)^500)$, using a gmp with generic x86 code and one with
> tuned SSE2 extensions.  And I could not see a difference, even when
> the GBC was setup so as not to be the bottleneck.  I'd be most
> interested if anyone else has encountered a real case where SSE or
> SSE2 gmp speeds up bignum processing.

The oddities of the pentium-4 arch mean sse2 there is a big help to
the low level operations.  That's the integer sse2, the floating point
sse and sse2 haven't found any use thusfar.

Of course how much low level speedups translate into the final result
is always a matter of all sorts of overheads.  I guess breaking out
the profiler might show where the time is going.  Followups to
gmp-devel if you find anything really gross.