GMP on the Cell processor

Décio Luiz Gazzoni Filho decio at decpp.net
Wed Apr 18 23:15:03 CEST 2007


Le Apr 18, 2007 à 1:43 PM, Linas Vepstas a écrit :

>> If you own a PS3 you can install Linux and have access to 6 SPEs  
>> because
>
> Particularly remarkable is the "protein folding at home" project,  
> which
> Sony ported to the PS3. This is a massive numerical simulation, where
> the goal is to minimize the quantum mechanical energy of an  
> arrangement
> of thousands of atoms (the protien).  What's remarkable are the
> statistics coming out of this:
>
> http://fah-web.stanford.edu/cgi-bin/main.py?qtype=osstats
>
> There are 32K PS/3s running the thing, delivering almost half a
> petaflop.  Compare this to about 200K intel boxes delivering
> 200 teraflops (1/5 of a petaflop). So the possibilities are there.
> (Although, to first order, its really just number of cores times clock
> speed.)

I've written SPU assembly code for the distributed.net RC5-72 key  
cracking effort. As I recall I was getting 23 or 24 MKeys/s per SPU,  
which is the best single-processor rate of the entire contest (G5s at  
2.7 GHz and [supposedly overclocked] G4s at 1.9 GHz get a little bit  
more than 20 Mkeys/s). Couple that with the fact that there are 6  
SPEs (plus one PPE) and that is one fast sucker. Also, this algorithm  
basically uses only instructions from one of Cell's pipes, so we're  
getting an IPC a little bit above 1. If e.g. 32-bit rotate  
instructions were executed on one pipe while arithmetic/logic  
instructions were executed on another pipe, I could probably achieve  
a 50% speedup, perhaps even more.

To stay a little bit on topic, I don't think the lack of double  
precision fully pipelined hardware would be detrimental to Cell's GMP  
performance, since as I understand everything is done with integer  
ops anyway, and Cell's support for that is superb (in fact, I have to  
concede that the SPE instruction set has surpassed Altivec as my  
favorite instruction set -- it's just that good). I agree it'd be a  
very difficult port though, particularly if the working set exceeded  
the local store's 256 KB limit (which is shared between data and  
code). GMP would probably have to expose memory management internals  
to the programmer in order to get maximum performance. I hope future  
iterations of Cell increase the size of the local store, though I  
think certain addressing modes are limited to 18 bits and these  
addressing modes wouldn't be usable on a larger local store.

Décio


More information about the gmp-discuss mailing list