Opteron speed

Fri Nov 14 22:21:38 CET 2003

Kevin Ryde <user42 at zip.com.au> writes:

> > The reason I ask is that we have a very intensive modelling program
> > which we've just switched over to using GMP (rather than calls to
> > sinl, sqrtl etc) and have noticed a *huge* speed decrease,
> 
> All the various overheads make a slowdown fairly inevitable on
> smallish operands.  If you only need quads or something then there
> might be a fixed-precision library that could go faster.

It might be possible to write basecases that do optimal multiplication
for each size, up to a limit of 4-8 or so limbs, using something like
hardcoded karatsuba on limbs or halflimbs. (This is not my idea, I got
it from Torbjörn).

A hack like this reminds me of an optimized qsort implementation
somebody at my university wrote, which used an optimal decision tree
for each array size from 2 up to some small limit.

Are there any hard theoretic results that give lower bounds for the
number of k x k -> 2k multiplications that are needed for multiplying
two n-bit numbers? That would be analogous to (but likely more
complicated than) the result that says that the worst case for any
general comparison-based sorting algorithm needs at least ceil(lg n!)
comparisons.

Regards,
/Niels