GMP and CUMP
Torbjorn Granlund
tg at gmplib.org
Mon Mar 11 22:12:38 CET 2013
Emmanuel Thomé <emmanuel.thome at gmail.com> writes:
Hi,
On Mon, Mar 11, 2013 at 9:53 PM, Torbjorn Granlund <tg at gmplib.org> wrote:
> I have 'mpz_vec_t', 'mpf_vec_t" in mind, which have some number of mpz_t
> elements, each probably (padded to) the same size counted in limbs.
> Then mpz_vec_add(a,b,c), etc would operate on such vectors a, b, c, each
> having the same number of elements...
>
> I don't think this will give good performance. Only of one builds
> sequences of expressions trees, hangs vectors on the leaves, then
> executes these, one could expect to come close to the GPU's peak
> performance.
Then how do you arrive to the estimate that ``2x speedup is about the
limit'' ? It's highly application-dependent.
Well, it is memory bandwidth dependent, of you loead and store operands
there for each mpz_vec_foo operation. Application shouldn't matter as
long as you have enough long vectors.
I looked in great detail at CUDA and Nvidia hardware. It would take
long to evolve my reasoning.
I haven't looked at AMD/ARI hardware. Perhaps it is more suitable for
what one would want to do.
--
Torbjörn
More information about the gmp-devel
mailing list