A question about GMPs performance on the Intel Core i3 & i5 processors

Sun Jul 25 03:45:41 CEST 2010

hi Will,

Status of cpu's always is a moment in history. How it is tomorrow we  
never know.

The multiplication difference AMD versus Intel already has been  
explained pretty well by GMP's godfather.

Right now get a 6-core AMD. It's 250 euro, default is 3.2Ghz, has 6  
cores and overclocks to 3.8Ghz easily on aircooling.
We did some compares and even on codes like the Woltman library,  
historically the domain of intel, it's dominating pricewise.

Additionally you can expect even woltman library to have a pathetic  
speedup by Hyperthreading. Less than 5%,
whereas all the benchmarks, yes also my chess benchmark that gets  
used, *somehow* they manage to get a percentage of
22% out of hyperthreading.

A real cool chip is the 980 chip from intel, based upon its 6 cores  
splitting to 12 logical cores and the fact that all their testmachines
have turboboost, provided you manage to cool it down to below fridge  
temperature and another bunch of constraints such as
a bios where you force turboboost to overclock a Mhz or 600.

Oh besides the most expensive RAM that you can't easily get in your  
store around the corner.

Yeah you're a tester or you're not is it?

They're not really using default ram nor default timings for that RAM.

So all foolishness aside that happens, there is no way to pricewise  
beat the 6 core AMD's as for now in a single socket setup,
except with a testmachine from intel which is factors more expensive;  
you can buy a bunch of 6-cores for that.

Now obviously the big 'win' used to be past few years hyperthreading  
as well, but again the multiplication unit, and there is only 1 of it
at the cpu's, is not going to profit from hyperthreading, as  
basically hyperthreading assembles from 2 logical cores together an  
instruction
stream into the existing 4 units from which only 1 can multiply.

So whether you feed that multiplication unit from 1 thread or from 2  
threads, that really doesn't matter much, as that multiplication unit is
already pretty much working in overtime, is it?

Again we can already annihilate the turboboost advantage by manually  
overclocking the AMD chip manual.

This turboboost really is evil as the testmachines overclock really a  
lot with 12 cores busy, whereas the chip you buy in the store
doesn't overclock with turboost that much. You don't have that bios  
is it huh?

I'm guessing AMD soon will follow intel in these tricks as well, we  
already see how IBM is slightly doing it.

It is really a big difference. Something like 400Mhz overclock we  
speak about which you do not have at home.

Also for specint i have proven a difference that shouldn't be there,  
when the i7-965 released.

It just had too much of a speed @ 8 logical cores, more than was  
possible, whereas a manual overclocked chip to 3.7Ghz
didn't have all that advantage. So besides of all the above intel  
wins an additional 5% or so which shouldn't be there.

Now if you have big cash it's great of course to buy a machine like  
that. If throughput per dollar matters to you i'd go right now
for the 6 core machines from AMD.

The i5 and i3 basically are evil strips of the i7 processor. Intels  
basic 3 advantages in the i7 are completely gone,
as the i3 and i5 have far less memory channels. It used to be 3 for  
i7 versus 2 for AMD, yet that's ancient history.

These cpu's also have less cores.

Realize that DDR3 ram is a lot faster than DDR2 ram, so you sure want  
to have a lot of it.

the i7 and later 'highend' cpu's from intel have a faster latency to  
RAM if you would do random reads.

Then it's roughly:
    70 ns for intel, versus 100 ns for AMD.

That advantage changes in a disadvantage for the i5 and i3.

In bandwidth the difference is a tad less of i7 versus Phenom2, yet  
it's still there in advantage of intel.

Yet the total bandwidth that the RAM can feed the processor with, you  
algorithmically should do something really
stupid to notice that.

Most calculations are something like O ( 2n log 2n) , or in case of  
DWT it's O ( n log n ), so it's possible in theory to
do "something in one or another cache"  log n times; that means that  
the bandwidth difference you *should* feel in case
of good algorithms only for the 'n' part, after which for each item  
it gets "log n" times out of the L1/L2/L3.

Practical that isn't the case as we speak, yet i'd guess it is a  
matter of time.

Vincent

On Jun 27, 2010, at 6:42 PM, Will Galway wrote:

> I'm considering getting a new computer using, most likely, either  
> an Intel i3-530 processor (my preference) or an Intel i5-570  
> processor.  In terms of the performance of GMP, is there any  
> particular reason I should prefer the latter processor?  (The  
> i5-570 has 8MB of level-3 cache, vs the i3-530's 4MB, but I rarely  
> use GMP for "large" operands, rather I tend to work mostly with  
> 128..256-bit operands.)
>
> Thanks for any advice that readers might have.
>
> -- Regards, Will Galway
>
> _______________________________________________
> gmp-discuss mailing list
> gmp-discuss at gmplib.org
> https://gmplib.org/mailman/listinfo/gmp-discuss