GPGPU support
Torbjorn Granlund
tg at gmplib.org
Tue Oct 12 09:12:03 CEST 2010
Morten Gulbrandsen <Morten.Gulbrandsen at rwth-Aachen.DE> writes:
My question is: will many cores or paralell computing help with
gmp-Chudnowski? Just for comparison? Why is not the number of cores
reported? I assume all results was 64 bits as an increase from 32 bits
to 64 bits will speedup with a factor four times.
For the rare GMP application where extremely large precision is used,
one may get about linear speedup in the number of cores. However, GMP
is not ready for that yet. Most GMP applications can parallelise at
their leverl, with great speedup.
We could parallelise the current FFT code, but that would not scale very
well, since it is what I call a large ring FFT. The coefficient ring is
so large that even a single coefficient will not fit in L1 cache, for
huge operands.
Linear speedup requires good very high cache hit rate in the FFT, I'd
say that a figure to aim for is 99%.
I am not aware of any meaningful parallelisation of any operation but
multiplication. Indirectly, most \omega(n) operations will become
parallelised, since they depend on multiplication.
E.g. addition can be parallelised with small theoretical overhead. In
practice, on cached modern computers, a parallel addition will run
several times slower than a sequential addition.
--
Torbjörn
More information about the gmp-discuss
mailing list