GPGPU support

Torbjorn Granlund tg at gmplib.org
Tue Oct 12 09:12:03 CEST 2010


Morten Gulbrandsen <Morten.Gulbrandsen at rwth-Aachen.DE> writes:

  My question is: will many cores or paralell computing help with
  gmp-Chudnowski? Just for comparison? Why is not the number of cores
  reported? I assume all results was 64 bits as an increase from 32 bits
  to 64 bits will speedup with a factor four times.
  
For the rare GMP application where extremely large precision is used,
one may get about linear speedup in the number of cores.  However, GMP
is not ready for that yet.  Most GMP applications can parallelise at
their leverl, with great speedup.

We could parallelise the current FFT code, but that would not scale very
well, since it is what I call a large ring FFT.  The coefficient ring is
so large that even a single coefficient will not fit in L1 cache, for
huge operands.

Linear speedup requires good very high cache hit rate in the FFT, I'd
say that a figure to aim for is 99%.

I am not aware of any meaningful parallelisation of any operation but
multiplication.  Indirectly, most \omega(n) operations will become
parallelised, since they depend on multiplication.

E.g. addition can be parallelised with small theoretical overhead.  In
practice, on cached modern computers, a parallel addition will run
several times slower than a sequential addition.

-- 
Torbjörn


More information about the gmp-discuss mailing list