AMD bulldozer and GMP
diep at xs4all.nl
Sat Feb 18 11:04:30 CET 2012
On Feb 14, 2012, at 10:28 PM, Torbjorn Granlund wrote:
> I've had the opportunity to measure GMP's performance on AMD's new
> high-end processor line, AMD FX, a k a Bulldozer.
> One should keep in mind that AMD has been in the lead in integer
> crunching since the original athlon (K7) came out, thanks to the
> superior handling of integer multiplication and add-with-carry, and
> AMD's 3-way integer issue (compared to 2-3 for Intel, until
> with is fully 3-way).
> The GMPbench results for Bulldozer is 36% worse than K10, clock-for-
> clock. To match this poor clock-for-clock performance, on needs to
> at very low-power processors like AMD bobcat and VIA nano, or the 15
> year old Alpha ev6.
> See: http://gmplib.org/gmpbench.html
> I explored some pipeline characteristics for an explanation. (1) AMD
> now handle just 2 integer insns/cycle. (2) Integer multiply is poorly
> pipelined, with a throughput of 1 every 4th cycle, and latency is 6-7
> cycles. (K8/K10 had 1/2 and 4-5 cycles, respectively.)
> See: http://gmplib.org/~tege/x86-timing.pdf
> (Only the first table has been updated for Bulldozer, its column is
> Timing numbers for GMP primitives are no better, they match bobcat
> See: http://gmplib.org/devel/asm.html
> The GMP numbers will improve with time, to about the level of an old
> Intel Core2 (i.e., two full tick-tock generations back).
> It is totally incomprehensible what AMD is doing. The new processor
> runs hot, slowly, and hardly outperforms a 5W processor for integer
> number crunching. OK, they do, thanks to a 2x clock and a more cores.
> But clock-for-clock they are equal.
> They need to go back to the K10 line, replace its aging branch
> predictors, replace the 2-way associative L1 data cache, and soup
> up its
> prefetching logic. This'll keep the powerfull ALUs busy. Then
> implement the whole thing in current silicon technology, and they'll
> have a great CPU again. I suppose they'll need to implement the
> few hundred SSE and AVX instructions too, which will mess up the floor
> If you consider purchasing a machine for integer number crunching, you
> need to get one with K10, i.e., AMD Phenom or Opteron 61xx, or Intel
> Sandybridge (socket 1155, socket 2011). Alternatively, go for low
> and good performance per Watt, and get a bunch of VIA nano or AMD
> bobcat. The main drawback for the latter two is that they don't
> ECC memory (but that's a problem with most Intel platforms too).
Thanks for providing the numbers. I totally agree with your conclusions.
As for the question : "it is totally incomprehensible what AMD is
the only explanation i could find is that they moved their R&D to
india and china.
Now for mass producing cheap plastic cans that's great.
For simpe software engineering that's great too.
As we see for a technology product which in the first place has to
quality, that's not so great.
The one thing they did do ok it seems is produce a design they can
really produce cheap.
The thing they really worried about is producing a chip that's doing
ok at games with AVX.
If we careful analyze the cpu however, we see clearly they really had
to throw in a lot more
power to get close to the performance of the cheapest intel gamer
cpu's (say i7-2600k).
Now we aren't interested in gamerscpu's, we care more for integer
Fact is that so far intel didn't release a quad core cpu with AVX
that's cheaper than bulldozer.
Bulldozer really is a lot cheaper.
Furthermore you refer to it that the sixcores of AMD were genius
designs. I tend to agree there.
Yet let's also mention that even a lot faster is the sixcores of
intel. The gulftowns are unrivalled in
performance, except when we look to the sixcore sandybridges (has
more memory channels but
other than that seems same performance).
However those sixcore intels are in a pricerange far above 500 euro.
Bulldozer is in a far cheaper segment of the market. the 3d world
segment if i may say so.
i doubt many in a 3d world nation have enough budget to pay for an
i7-2600k, let alone an i7-3930k.
So from that perspective intel still has to move. If they do, they
can wipe out AMD from the cpu markets.
Furthermore what i find very worrying is the increased activity from
nations around Iran and AMD. If we look to the FFT implementation
in opencl, probably meant for a gpu, we see a bunch of researchers
getting quoted implementing a cooley-tukey type algorithm,
which if i google for them the guys are from Pakistan.
AMD has gone the arab way.
Arrogantly throwing away superior sixcore designs and totally
starting from scratch building something where only intel is good
at, namely a high clocked processor with some form of SMT, with AVX,
that's typically belonging to Indian culture.
If we look at bulldozer without magnifying glasses, then basically in
an original manner they recreated a quadcore i7 with AVX.
That 3 years after i7-965 released.
The original design choices directly makes it underperform; they use
huge slow caches, versus intel tiny fast clocked ones.
AMD has 8 multiplication units, versus intel 4.
Yet i bet during tuning their design they discovered it's part of the
worst case path
and they slowed them down - which is why bulldozer for GMP is a
> gmp-discuss mailing list
> gmp-discuss at gmplib.org
More information about the gmp-discuss