AMD bulldozer and GMP

Vincent Diepeveen diep at xs4all.nl
Sat Feb 18 11:04:30 CET 2012


On Feb 14, 2012, at 10:28 PM, Torbjorn Granlund wrote:

> I've had the opportunity to measure GMP's performance on AMD's new
> high-end processor line, AMD FX, a k a Bulldozer.
>
> One should keep in mind that AMD has been in the lead in integer  
> number
> crunching since the original athlon (K7) came out, thanks to the
> superior handling of integer multiplication and add-with-carry, and
> AMD's 3-way integer issue (compared to 2-3 for Intel, until  
> Sandybridge
> with is fully 3-way).
>
> The GMPbench results for Bulldozer is 36% worse than K10, clock-for-
> clock.  To match this poor clock-for-clock performance, on needs to  
> look
> at very low-power processors like AMD bobcat and VIA nano, or the 15
> year old Alpha ev6.
>
> See: http://gmplib.org/gmpbench.html
>
> I explored some pipeline characteristics for an explanation.  (1) AMD
> now handle just 2 integer insns/cycle.  (2) Integer multiply is poorly
> pipelined, with a throughput of 1 every 4th cycle, and latency is 6-7
> cycles.  (K8/K10 had 1/2 and 4-5 cycles, respectively.)
>
> See: http://gmplib.org/~tege/x86-timing.pdf
>
> (Only the first table has been updated for Bulldozer, its column is
> "BD1".)
>
> Timing numbers for GMP primitives are no better, they match bobcat
> more-or-less.
>
> See: http://gmplib.org/devel/asm.html
>
> The GMP numbers will improve with time, to about the level of an old
> Intel Core2 (i.e., two full tick-tock generations back).
>
> It is totally incomprehensible what AMD is doing.  The new processor
> runs hot, slowly, and hardly outperforms a 5W processor for integer
> number crunching.  OK, they do, thanks to a 2x clock and a more cores.
> But clock-for-clock they are equal.
>
> They need to go back to the K10 line, replace its aging branch
> predictors, replace the 2-way associative L1 data cache, and soup  
> up its
> prefetching logic.  This'll keep the powerfull ALUs busy.  Then
> implement the whole thing in current silicon technology, and they'll
> have a great CPU again.  I suppose they'll need to implement the  
> latest
> few hundred SSE and AVX instructions too, which will mess up the floor
> plan.
>
> If you consider purchasing a machine for integer number crunching, you
> need to get one with K10, i.e., AMD Phenom or Opteron 61xx, or Intel
> Sandybridge (socket 1155, socket 2011).  Alternatively, go for low  
> power
> and good performance per Watt, and get a bunch of VIA nano or AMD
> bobcat.  The main drawback for the latter two is that they don't  
> support
> ECC memory (but that's a problem with most Intel platforms too).
>

hi Torbjorn,

Thanks for providing the numbers. I totally agree with your conclusions.

As for the question : "it is totally incomprehensible what AMD is  
doing",
the only explanation i could find is that they moved their R&D to  
india and china.

Now for mass producing cheap plastic cans that's great.
For simpe software engineering that's great too.

As we see for a technology product which in the first place has to  
deliver high
quality, that's not so great.

The one thing they did do ok it seems is produce a design they can  
really produce cheap.

The thing they really worried about is producing a chip that's doing  
ok at games with AVX.
If we careful analyze the cpu however, we see clearly they really had  
to throw in a lot more
power to get close to the performance of the cheapest intel gamer  
cpu's (say i7-2600k).

Now we aren't interested in gamerscpu's, we care more for integer  
performance.

Fact is that so far intel didn't release a quad core cpu with AVX  
that's cheaper than bulldozer.
Bulldozer really is a lot cheaper.

Furthermore you refer to it that the sixcores of AMD were genius  
designs. I tend to agree there.
Yet let's also mention that even a lot faster is the sixcores of  
intel. The gulftowns are unrivalled in
performance, except when we look to the sixcore sandybridges (has  
more memory channels but
other than that seems same performance).

However those sixcore intels are in a pricerange far above 500 euro.

Bulldozer is in a far cheaper segment of the market. the 3d world  
segment if i may say so.

i doubt many in a 3d world nation have enough budget to pay for an  
i7-2600k, let alone an i7-3930k.

So from that perspective intel still has to move. If they do, they  
can wipe out AMD from the cpu markets.

Furthermore what i find very worrying is the increased activity from  
nations around Iran and AMD. If we look to the FFT implementation
in opencl, probably meant for a gpu, we see a bunch of researchers  
getting quoted implementing a cooley-tukey type algorithm,
which if i google for them the guys are from Pakistan.

AMD has gone the arab way.

Arrogantly throwing away superior sixcore designs and totally  
starting from scratch building something where only intel is good
at, namely a high clocked processor with some form of SMT, with AVX,  
that's typically belonging to Indian culture.

If we look at bulldozer without magnifying glasses, then basically in  
an original manner they recreated a quadcore i7 with AVX.
That 3 years after i7-965 released.

The original design choices directly makes it underperform; they use  
huge slow caches, versus intel tiny fast clocked ones.
AMD has 8 multiplication units, versus intel 4.

Yet i bet during tuning their design they discovered it's part of the  
worst case path
and they slowed them down - which is why bulldozer for GMP is a  
bulldroop

> -- 
> Torbjörn
> _______________________________________________
> gmp-discuss mailing list
> gmp-discuss at gmplib.org
> https://gmplib.org/mailman/listinfo/gmp-discuss
>



More information about the gmp-discuss mailing list