Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Congratulations biv343 on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Benefits of L2 cache size, AMD vs Intel

Processors

Benefits of L2 cache size, AMD vs Intel

by edemiere Posted Sep 2, 2003 (Edited Aug 14, 2020)

**NOTE**
This was a research/test that was done and at some point I had saved in a .txt file. I do not remember who conducted it but I know the information is factual. It wasn't my "creation" and I have merely put it here for its valuable information in understanding the cache differences between the two CPU makers.

----------------------

What are the Benefits of a Larger Cache?
Bigger is better right? So a 512KB L2 cache must be better than a 256KB one - after all, AMD wouldn't spend 17 million transistors for no gain. Although it's very true that a larger cache is generally beneficial, the real question is how beneficial and in what situations. To answer that question, we should have a quick lesson in caches and what makes them so useful.

Think of a cache as a bridge between two entities - a slower and a faster one. In this case, the cache we are talking about is part of a multilevel cache system and it helps to bridge the gap between the CPU and main memory.

It's no surprise that main memory runs significantly slower than today's CPUs. Not only does memory run at significantly slower clock speeds (e.g. 200MHz for DDR400) than today's CPUs, but main memory is physically located very far away from the processor. Our multi-gigahertz CPUs have to waste well over 100 clock cycles to retrieve data from main memory as their requests must cross over slow front-side buses, through an external memory controller, to the memory and back. Making this trip can wreak havoc on performance, especially for CPUs with very long pipelines, as these pipelines generally remain idle if the data necessary to populate them has to be fetched from main memory.

The idea behind a processor's caches is that you store important data in these high speed memories (now located on the processor's die itself), so that most of the time, your CPU doesn't have to make the long trip to main memory. The reason caches are split into multiple levels is because the larger your cache is, the longer it takes to fetch data. Therefore, it ends up being that having one smaller but very low latency cache combined with a larger and somewhat higher latency (but still significantly quicker than main memory) cache provides the best balance of performance in today's microprocessors. These two caches are the Level 1 (L1) and Level 2 (L2) caches you hear about all the time.

Caches work based on two major principles - spatial and temporal locality. These two principles are simple; spatial locality states that, if you are accessing data, then, the data around it will be accessed soon, and temporal locality states that if you are accessing data, chances are that you'll access that same piece of data again. In practice, this means that frequently accessed data is kept in cache, as well as data physically around it. Since caches are of relatively small sizes (rightfully so, it would be cost and performance prohibitive to have main memory-sized caches), the algorithms they use to make sure that the right information remains in the cache is even more critical to performance than the sheer size of the cache.

With Barton, AMD left their L1 the same as before, but increased their L2 cache size by a total of 256KB. AMD didn't change any of the specifications of the cache (e.g. it is still a 16-way set associative L2 cache) Luckily, AMD increased the cache size without sacrificing access time, but where will the added L2 cache help?

Let's look at those two principles we mentioned before, spatial and temporal locality. If an application's usage pattern does not abide by either one of these principles, then it doesn't matter how much cache you add, the performance will not improve. So what are some examples of applications that are and are not cache-friendly?

For starters, let's talk about things that don't abide by the principle of temporal locality - mainly multimedia applications, more specifically - encoding applications. If you think about how encoding works, the data is never reused, simply encoded on a bit-by-bit basis and then the original data is never touched again. At the other end of the spectrum, we have things like office applications that happily abide by the principle of temporal locality. In these sorts of applications, you are often re-using data, performing very similar tasks to them over and over again and thus making great use of larger caches.

The principle of spatial locality applies to a much wider range of applications, including multimedia encoding applications because of the fact that data is generally stored in contiguous form in main memory and is thus very cache-friendly. Spatial locality is why you will see some improvement from larger caches even in applications that don't exhibit much temporal locality.

AMD's Cache Benefits vs. Intel's Cache Benefits
All caches are not created equal and thus you should not expect AMD to benefit as much as Intel did from going to a 512KB L2 cache. Intel follows a much more conventional L1/L2 cache architecture that uses what is known as the inclusive principle; the inclusive principle states that the contents of the L1 cache are also included in the L2 cache. The obvious downside to this is that the L2 cache contains some data that is redundant that the CPU will never use (if it needs it, it will get it from the faster L1 cache). From the CPU's point of view, an inclusive cache just means it has less room to store its much needed data in, but from the standpoint of the rest of the system an inclusive cache does provide one advantage - if data is updated in main memory (e.g. through DMA), the memory controller only has to check the L2 cache to update data, and there is no need to check L1 for coherency. This is a small but important benefit to an inclusive cache architecture.

The opposite, obviously, is a cache subsystem that follows the exclusive principle - such as the Athlon XP's cache. In this case, the contents of the L1 cache are not duplicated in the L2 cache, thus favoring cache size over the added latency of checking for two levels of cache coherency in DMA situations. The exclusive approach makes much more sense for AMD, considering the Athlon XP has an extremely large 128KB L1 cache that would be very costly to duplicate in L2 (compared to Intel's 8KB L1 Data cache that is easily duplicated in L2).

Both architectures have their pros and cons, but are best suited for the particular CPU we are talking about. Recognizing the differences, however, helps us understand why AMD will benefit differently from Intel when it comes to the 256KB to 512KB cache leap, but this still isn't the full story.

What about the 400MHz FSB?
After Comdex, the word on the street was that AMD would be moving Barton to a 400MHz FSB in the near future but that the CPU would debut with a 333MHz FSB. As you can tell by today's release, we are still dealing with 333MHz FSB CPUs, but what is there to be said about the potential impact of a 400MHz FSB?

A larger L2 cache means that Barton has to go to main memory much less often (assuming that our applications do abide by the principles of spatial and temporal locality), which means that it has to send requests and receive data across the FSB much less frequently compared to an identically clocked Thoroughbred.

Since Barton is being launched at speeds slower than the fastest Thoroughbred, the immediate need for a 400MHz FSB isn't apparent - remember, FSB traffic should be reduced by the larger L2 cache. However, as Barton ramps up in clock speed, the move to a 400MHz FSB may become more appetizing as higher clocked Athlon XPs will require data at a faster rate to keep their pipelines filled.

So today, Barton would benefit less from a 400MHz FSB than the Thoroughbred core, which isn't much at this point either. Remember that the main benefit of the 333MHz FSB was latency reduction because of the fact that the FSB and memory bus were finally operating at the same clock speed once again, and not because of the increase in FSB bandwidth.

Please Note: 1 is Bad, 10 is Good :-)

Part and Inventory Search

This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.

Accept Learn more…

Back

Top