Torbjorn Granlund tg at
Mon Mar 11 22:12:38 CET 2013

Emmanuel Thomé <emmanuel.thome at> writes:

  On Mon, Mar 11, 2013 at 9:53 PM, Torbjorn Granlund <tg at> wrote:
  > I have 'mpz_vec_t', 'mpf_vec_t" in mind, which have some number of mpz_t
  > elements, each probably (padded to) the same size counted in limbs.
  > Then mpz_vec_add(a,b,c), etc would operate on such vectors a, b, c, each
  > having the same number of elements...
  > I don't think this will give good performance.  Only of one builds
  > sequences of expressions trees, hangs vectors on the leaves, then
  > executes these, one could expect to come close to the GPU's peak
  > performance.
  Then how do you arrive to the estimate that ``2x speedup is about the
  limit'' ? It's highly application-dependent.
Well, it is memory bandwidth dependent, of you loead and store operands
there for each mpz_vec_foo operation.  Application shouldn't matter as
long as you have enough long vectors.

I looked in great detail at CUDA and Nvidia hardware.  It would take
long to evolve my reasoning.

I haven't looked at AMD/ARI hardware.  Perhaps it is more suitable for
what one would want to do.


More information about the gmp-devel mailing list