[MPFR] GMP+MPFR to GPU?

Wed Dec 1 10:02:26 CET 2010

Niels,

I have contacted John Stone at University of Illinois in
Urbana-Champaign, and Mian Lu at Hong Kong University.

Mian Lu has an existing double-double and quad-double implementation (31
and 64 significant digits in base-10) that are still works in progress
(see Google code for gnuprec).  However, these operate correctly on the
GPU at those precisions.  He has stated that porting to an arbitrary
precision variation of his model would be difficult due to its design,
but is possible at some point.  He is willing to consider a beginning,
and port his GPU-accelerated version for fixed-precision libraries as
off-shoot or branch (as GMP-64, MPFR-64 and MPC-64 projects), providing
support for those levels of precision at first with the full
user-exposed function base seen in those libraries, and then later
moving to an arbitrary precision model.

John Stone has a significant, close relationship with NVIDIA for CUDA
support, and may be able to provide guidance as well.  I wrote an
article about the NCSA super-computers for TG Daily back in 2007 which
helped that facility get a grant for their current supercomputers, both
of which use heavy GPU-acceleration for high TFlops throughput.

I will pull together some information this week and next, and present a
PDF paper on my findings to GMP and MPFR.  If anybody has something
positive to contribute in the mean-time, please contact me via email.

Thank you. 

- Rick C. Hodgin

On Wed, 2010-12-01 at 09:46 +0100, Niels Möller wrote:
> "Rick C. Hodgin" <foxmuldrster at yahoo.com> writes:
> 
> > Is there a developer's manual about the internal structure of the GMP
> > and/or MPFR code?  Or is there a #irc channel where I could ask
> > questions?
> 
> Start by reading the section about the mpn interface in the GMP manual.
> Questions can be asked on this list.
> 
> To see if GPU support is promising, you may want to start by
> implementing the mpn_addmul_1 function for the GPU, and see what
> performance you get. Or if you have some particular application in
> mind, profile it to see which of the low-level mpn_* functions is most
> important to you. I suspect mpn_addmul_1 (or mpn_addmul_2, if
> implemented) is the main workhorse for most GMP applications.
> 
> I don't know anything about how access to the GPU works; if it's
> available for ordinary users, if some operating system support is
> needed, etc. I imagine integration can be a bit tricky. A short summary
> about how it works would be nice.
> 
> The current thinking about parallell processing is that dividing up the
> work for parallell processing is usually best done at some level a bit
> above the low-level arithmetic. It should be better to do several large
> operations in parallell than, e.g., let the low-level Toom-2/Karatsuba
> algorithm spawn subproblems to multiple cpu:s.
> 
> For a SIMD architecture (GPUs are usually like this, right?)
> optimization considerations may be a bit different than for the usual
> GMP architectures. It might make some sense to try to implement
> mpn_addmul_k for some unusually large k, letting each cpu compute A *
> b_j where A is a bignum shared by all cpu:s, and b_j is a single word
> with different value for each cpu.
> 
> Regards,
> /Niels
>