Bulldozer and 64 bits integer multiplication

Fri Oct 14 16:43:14 CEST 2011

Vincent Diepeveen <diep at xs4all.nl> writes:

  I tend to remember that for FFT the limiting speed of GMP is the
  multiplication throughput that one can achieve in integers.

I dn't think this to be the case for GMP's FFT code.  It is more related
to cache size and memory bandwidth.

  Now Bulldozer is a weirdo design on paper of a quad core chip that
  splits into 8 minicores.  Yet each minicore as we can see on the
  drawing has its own multiplication unit.

Its own *integer* multiplication unit, yes.

  That would mean that doing 64 bits integer transform is nearly double
  the speed of quad core chippie.

That would have been great, but unfortunately that is not really the
case.  The [64-bit] Athlon and Opteron chips have an integer multiplier
that can sustain a new result every 2nd cycle, while the Bulldozer
multipliers can only sustain one new result every 4th cycle.

  What's a good way to test this with GMP on a bulldozer?

How about running some tests measuring their performance?

  What options are there in GMP, i tend to remember last time that i
  checked it's code that uses 32 x 32 bits integers in SIMD right?

Not quite.  It uses (scalar) word multiplies, where a 'word' is 64 bits
on a 64-ut machine.

-- 
Torbjörn