multi-threaded multiplication speed-up

Pierrick Gaudry gaudry at
Mon Jan 15 08:18:25 CET 2007

On Thu, Jan 11, 2007 at 06:01:54PM -0500, Jason Martin wrote:
> Hi All,
> Out of curiosity, I recently tried a naive multi-threaded
> multiplication scheme (using pthreads) and got a speed-up of about 25%
> for extremely large operands (over 2000 limbs).  I was just using
> simple "textbook" style multiplication (break the larger operand in
> half, do two multiplies using mpn_mul, then shift/add back together).
> I got similar results on a Woodcrest running OS X and a dual core
> Opteron running Linux.
> I suspect that a quad-core CPU in which all the cores share the L2
> cache could really benifit from a Karatsuba-style multi-threaded
> multiply since the three multiplies could run in parallel without
> having to go off-chip for memory accesses.
> Is anyone working on this already?  Does anyone care?
> --jason

Hi Jason,

I had students working on a multi-threaded version of Karatsuba. On a
bi-proc Opteron they got a range where they were faster than the
multiplication of GMP (GMP was using FFT for that range, I don't remember
the version number).

Now that there are two experiments tending to prove that GMP can take
advantage of multithreaded computation I think this would make sense to
start a real project on that. But then, of course, FFT is the first
algorithm to start to parallelize, since for large sizes parallelization
will be more profitable.

If you start working on that, I'll be happy to help if needed (I have
been working recently on the FFT code, so I am now familiar with it).


More information about the gmp-discuss mailing list