CUDA, OPENCL AND GMP. TOWARDS A HYBRID GMP?

Timothee Ewart Timothee.Ewart at unige.ch
Thu Apr 25 21:06:30 CEST 2013


Dear all,

I may add a small contribution. I wrote in assembly some ASM arithmetic on GPU Kepler GK 110. (Karatsuba + school book algo).
Make a school book algo on GPU on a single thread is useless, and not efficient at all, you must have a // algo for your operation (forget add/sub)
As remarked Tobjorn, it can only work for very large huge number, where GPU threads can work independently, I think Tobjorn was thinking about FFT multiplication. 

About the peak performance on GPU forget the number of nvidia, it is calibrate for float. Presently for 32 bit integer the peak for the fuse mul/add  operations 
for a NVIDIA Tesla K20X  GPU is :

nSMX·f·μ·r=14SMX·732·10^6 cycle/s · 32 instr / (SMX · cycle) · 2 IOP/instr = 655 GIOP/s , compare to the 3.95 TFLOP/s for float.

After in terms of assembly it is easier to matin x86/power64 than GPU, as the specs change every 6 months for NVIDIA.
For OpenCL I am not sure you can manage the carry bit.

Best

tim
Le 25 avr. 2013 à 20:45, Allan Menezes <amenezes007 at sympatico.ca> a écrit :

> Dear Tobjorn, Todd,
> 
> 
>    I had brought this question up long before of using specifcally CUDA and NVIDIA  GPUs
> 
> 
> to accelerate some GMP functions. To compromise one can create a hybrid gmp library where some
> 
> 
> gmp functions are executed on the host machines CPU and some on the GPU and some perhaps a 
> 
> 
> combination of both. For example there exists a GMP port of some mpf fumctions in CUMP (google
> 
> 
> cump and cuda) . But my much earlier suggestion was using par4all software (google par4all)
> 
> 
> to change carefully the generic subdirectory of C routines in the gmp library by the pat4all software
> 
> 
> into CUDA routines which can be compiled with nvcc. Par4all takes as input a C program and outputs 
> 
> 
> a paralellized .cu  file. On the other hand for AMD Graphic Cards  or NVIDIA with OpenCl one could use
> 
> 
> SnuCl( google snucl opencl).
> 
> 
> Par4all would only compile source C files into .cu which would could still be under the Creative Commons
> 
> 
> Licensing paradigm and the end user with a suitable configure could compile the the hybrid GMP library
> 
> 
> on his/her/other machine using ones on install CUDA software. Hence the branch Hybrid GMP library source as distributed
> 
> 
> would be suitably open source.
> 
> 
> Notably a good high end GPU would be the Geforce TITAN for about $1000CDN which boasts a double precision
> 
> 
> peak of 1.3 Teraflops and $.5 Teraflops single precision to test it on.
> 
> 
> 
> 
> 
> Allan 		 	   		  
> _______________________________________________
> gmp-discuss mailing list
> gmp-discuss at gmplib.org
> https://gmplib.org/mailman/listinfo/gmp-discuss



More information about the gmp-discuss mailing list