Using extra cores in gmp?
Torbjorn Granlund
tg at gmplib.org
Sun Mar 30 15:25:59 UTC 2014
Ronald Bruck Jr <bruck at usc.edu> writes:
Now that 12-core Intel processors are available (for a mind-boggling
combined 24 cores on the right mobo), let me ask whether there are any
experimental versions of gmp which can use this many cores. In
particular, many of my operations are carried out to thousands of digits
(though seldom more than 32000), and it would seem that multiplication
and division would greatly benefit from multithreading. (Even lowly
addition!)
I assume you are talking about decimal digits. 32000 decimal digits is
about 1700 64-bit words. I very much doubt one could get much speedup
from multiplies of that size, where each operation take around 3 ms on a
modern high-end CPU. One problem is that a intermediate result from one
CPU will live in its L1 cache. Accessing this from another CPU is very
expensive.
One could get speedup for multiplies of large enough operands, though.
Unfortunately that would be a lot of hard work, and the utility would
limited as huge operands are not common.
As it turns out, most of my use of multiprecision wouldn't benefit much
from such parallelism. Most of my uses involve (hundreds of) thousands
of repetitions of a single suite of programs, and it's fine to launch 20
or so invocations at a time on single threads. Each individual program
takes much longer to run than if gmp were multithreaded, but the time
for the whole collection will be about the same (or even faster).
I believe that's the typical scenario for GMP number crunching
applications.
But I can foresee future situations where single invocations would be
useful. I once thought that GPU's could accelerate computations, but I
quickly discovered that these are largely memory-bound. Someone from
Bailey's (competing) multiprecision group wrote me, several years ago,
that THEY found the same, and thought the future was in the increasing
number of cores.
Using current GPUs does not help GMP much. I made a quite careful
feasibility study of porting GMP to Nvidia and AMD GPUs some years ago,
and determined that even a high-end GPU does not provide enough multiply
bandwidth to outperform a CPU by a very large margin. This might change
but requires some unlikely architecture changes to the GPUs.
Torbjörn
Please encrypt, key id 0xC8601622
More information about the gmp-discuss
mailing list