GMP on the Cell processor
Décio Luiz Gazzoni Filho
decio at decpp.net
Wed Apr 18 23:15:03 CEST 2007
Le Apr 18, 2007 à 1:43 PM, Linas Vepstas a écrit :
>> If you own a PS3 you can install Linux and have access to 6 SPEs
>> because
>
> Particularly remarkable is the "protein folding at home" project,
> which
> Sony ported to the PS3. This is a massive numerical simulation, where
> the goal is to minimize the quantum mechanical energy of an
> arrangement
> of thousands of atoms (the protien). What's remarkable are the
> statistics coming out of this:
>
> http://fah-web.stanford.edu/cgi-bin/main.py?qtype=osstats
>
> There are 32K PS/3s running the thing, delivering almost half a
> petaflop. Compare this to about 200K intel boxes delivering
> 200 teraflops (1/5 of a petaflop). So the possibilities are there.
> (Although, to first order, its really just number of cores times clock
> speed.)
I've written SPU assembly code for the distributed.net RC5-72 key
cracking effort. As I recall I was getting 23 or 24 MKeys/s per SPU,
which is the best single-processor rate of the entire contest (G5s at
2.7 GHz and [supposedly overclocked] G4s at 1.9 GHz get a little bit
more than 20 Mkeys/s). Couple that with the fact that there are 6
SPEs (plus one PPE) and that is one fast sucker. Also, this algorithm
basically uses only instructions from one of Cell's pipes, so we're
getting an IPC a little bit above 1. If e.g. 32-bit rotate
instructions were executed on one pipe while arithmetic/logic
instructions were executed on another pipe, I could probably achieve
a 50% speedup, perhaps even more.
To stay a little bit on topic, I don't think the lack of double
precision fully pipelined hardware would be detrimental to Cell's GMP
performance, since as I understand everything is done with integer
ops anyway, and Cell's support for that is superb (in fact, I have to
concede that the SPE instruction set has surpassed Altivec as my
favorite instruction set -- it's just that good). I agree it'd be a
very difficult port though, particularly if the working set exceeded
the local store's 256 KB limit (which is shared between data and
code). GMP would probably have to expose memory management internals
to the programmer in order to get maximum performance. I hope future
iterations of Cell increase the size of the local store, though I
think certain addressing modes are limited to 18 bits and these
addressing modes wouldn't be usable on a larger local store.
Décio
More information about the gmp-discuss
mailing list