marc.glisse at inria.fr
Sat Oct 29 10:03:41 CEST 2011
On Sat, 29 Oct 2011, Kentaro Yamamoto wrote:
> I am using GMP for computing modular exponentiation by repeated
> squaring (since the power are always a power of 2 and the modulus has
> such a good property that a special way of computing remainder can be
> used, I'm doing it by myself instead of using mpz_powm().) The modulus
> is bigger than MUL_FFT_THRESHOLD and the time needed by multiplication
> accounts for almost all of the running time of my application.
> I have an Intel Core 2 Quad processor, and would like my application to
> scale on it. Since I don't know how to introduce parallelism in the
> repeated squaring part (if you do, please let me know), I am planning
> to parallelize in the library level by OpenMP.
do you only compute the exponentiation of a single number? When you
parallelize, the higher the level, the greater the gains, and computing
the exponentiation of several numbers in parallel sounds like it would
> I don't know much about
> how the algorithms used are combined (it seems more complicated than the
> manual describes), but firstly I added `#pragma omp for' line at the
> beginning of the for loop in the base case of mpn_fft_mul_modF_K().
> The running time almost halved, but since there are four cores, I would
> like the application to run more faster.
> Do you know other good places to put OpenMP directives?
First, a gain of a factor 2 on 4 processors, when you parallelize at a low
level, is a great gain!
Some algorithms parallelize better than others, and the one currently in
gmp wasn't written for that purpose. If I understand correctly, the FFT
multiplication first computes the transform of each operand (you could try
doing those 2 in parallel), then multiplies the 2 transforms componentwise
(what you already parallelized) and eventually retransforms to get the
result. If you are close to the FFT threshold, you could also try
parallelizing the toom multiplications. Note that making things parallel
completely changes the thresholds and other tunings. I am not sure if
running make tune and recompiling is enough to fix that (and running times
with openmp are likely too random for the tuning to work).
See also this discussion (but beware that the code is slightly different):
More information about the gmp-discuss