SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Intel Corporation (INTC) -- Ignore unavailable to you. Want to Upgrade?


To: wanna_bmw who wrote (162786)3/22/2002 11:52:00 AM
From: Dan3  Read Replies (1) | Respond to of 186894
 
Re: Finally, given realistic data that follows the principles of locality, we know two common truths.

You keep ignoring the process of memory allocation. The OS allocates memory in blocks, which start at a limited set of LSBs.



To: wanna_bmw who wrote (162786)3/22/2002 12:53:43 PM
From: dale_laroy  Respond to of 186894
 
Given the principles of locality, I still maintain that a >512KB 8-way set associative WB cache for any CPU would be higher performance than a 256KB 16-way set associative WB cache - and that would include both hit-rate and look-up time. I would love to be able to prove that this would be relevant to the Athlon as well, but I'm afraid you're just going to have to take my word for it. Chances are that the kind of data that can prove my point would not be publicly available.<

No argument here. Indeed, a six-way set associative 384KB cache would probably have a higher hit rate than a sixteen-way set associative 256KB cache. But I was talking about a sixteen-way set associative 512KB L2 cache with hardware prefetch versus an eight-way set associative 1MB L2 cache with the same number of entries, also having hardware prefetch. For most common applications in a single tasking mode of usage, going beyond 384KB L2 cache yields sharply diminishing returns on transistor budget. And, during multitasking usage, increasing the number of ways yields better results than increasing the size of the cache.

>Conceptually speaking, though, given those two cache configurations, in one of them, you have double the number of storage areas per set, and in the other, you have 4x the number of sets. Given the ideal situation of perfectly randomized data, it should be obvious that having 4x more sets is preferable to having 2x more ways per set. In this case, the 512KB 8-way set associative cache will have twice the hit rate of the 256KB 16-way set associative cache. Given the other ideal of perfectly set-aligned data, the latter cache configuration will have twice the hit rate of the former.<

What two cache configurations? In comparing Northwood to Athlon, both have the same number of cache lines, and this is what would be important when using perfectly randomized data. If data were perfectly randomized there would be little doubt that Athlon would have the advantage, but data is not perfectly randomized, so the advantage is slightly in favor of Northwood. The question is, will this hold true for Clawhammer with a 512KB L2 cache versus Prescott with a 1MB L2 cache?

>Finally, given realistic data that follows the principles of locality, we know two common truths. The first is that the same data is likely to be called upon a second time in a close subsequent access, and the second is that data close to the first data is likely to be called upon a second time in a close subsequent access. Given these common truths, and given that set-aligned addresses are very far apart in memory, it's unlikely that many subsequent accesses will be set-aligned. Therefore, a 16-way set associative 256KB cache will not have an advantage over an 8-way set associative 512KB cache. In practice, this is exactly the case, so the larger cache - not the higher associative cache - would perform better.<

Realistically, structures are often aligned on boundaries that are a power of two, and often processed sequentially and concurrently. Increasing the number of ways increases the number of structures that can be concurrently processed without thrashing the cache. Other factors should also be taken into consideration, such as the probability that a cache miss will be a page hit, versus a page miss. Sacrificing page hit cache miss rate to improve page miss cache hit rate is a valid strategy for increasing performance. Increasing the number of ways improves the hit rate when multiple large structures are being processed concurrently and sequentially, which is also the situation in which it is most likely that active DRAM pages will be thrashing.