A question about GMPs performance on the Intel Core i3 & i5 processors

Sun Jul 25 04:11:35 CEST 2010

hi Will,

On L3 caches:

What matters sometimes is the size of the L1 cache. The size of the  
L3 cache really is no big deal even for large working set sizes.

Either you are INSIDE the L3 cache or you are OUTSIDE the L3 cache is  
the reality.

It is true that for some weirdo tests it seems to matter, however  
that's because of limits given by specint.
For example specint2004, later called specint2006, had a limit of  
256MB ram.

So most programmers decided to get far underneath this. For example  
Sjeng has a hashtable of 150MB.
In that 150MB cache of course some sort of form of chaining happens.

You'll realize that then suddenly the size of the L3 cache matters some.

Same is true for others, most have chosen a working set size of a few  
dozens of megabytes.
Amazingly the program profitting really a lot there is the benchmark  
of GCC there.

You can see that at the IBM power series, this chip really flies on  
the GCC benchmark thanks to its big L3 caches.

However, all that is total not realistic.

For a chess program you don't want to eat 150MB with a max of 256MB  
ram of course. In reality i eat away the whole
RAM, which is gigabytes. The size of the L3 doesn't matter then.

Other benchmarks eat just a few dozens of megabytes, BECAUSE they had  
to write a test that finished within X seconds.
"By accident" that means that bigger L3 caches help, whereas in  
reality it just doesnt matter at all.

Realize also what a L3 cache is. It's the last of the 3 caches, so  
the least important one.

The L1 is what matters and all manufacturers have nowadays optimize  
their L1 in such a manner that it really performs well,
so i'd argue that where the caches and memory subsystems are total  
different already between AMD and Intel, that really is of
no concern to whatever you do.

Realize that the L1 cache is far more important than the L3 and the  
L1 cache of AMD used to be factor 2 bigger than i7.

128KB versus 64KB.

Now it's a fact that intel simply has chosen a different approach  
there, a cheaper approach, allowing them to make more profit
per CPU. With that intel is the exception, not AMD i'd argue; we can  
see that most manufacturers choose or chose a much bigger L1
cache than intel historically did. With the P4 really being worlds  
cheapest produced PC processor, having a microtiny L1 and for
instructions in fact not a L1 at all, just a tracecache.

So for the shareholders all this is good news, for performance it  
isn't. Where core2 and i7 really win a lot of terrain is the improved
branch prediction and fact it can do 4 instructions a cycle rather  
than 3.

Yet we're already in depth busy discussing the architecture of the  
execution units here, not the caches at all.

I'd argue that the limiting bottleneck for nearly all mathematical  
software is basically the multiplication unit, as each core just has 1
of them and it has a latency, in case of integers, of 3.75 cycles  
instead of 1.

That really sucks.

On Jun 27, 2010, at 6:42 PM, Will Galway wrote:

> I'm considering getting a new computer using, most likely, either  
> an Intel i3-530 processor (my preference) or an Intel i5-570  
> processor.  In terms of the performance of GMP, is there any  
> particular reason I should prefer the latter processor?  (The  
> i5-570 has 8MB of level-3 cache, vs the i3-530's 4MB, but I rarely  
> use GMP for "large" operands, rather I tend to work mostly with  
> 128..256-bit operands.)
>
> Thanks for any advice that readers might have.
>
> -- Regards, Will Galway
>
> _______________________________________________
> gmp-discuss mailing list
> gmp-discuss at gmplib.org
> https://gmplib.org/mailman/listinfo/gmp-discuss