Dale, Re: "I am not sure that 16-way is optimum for Athlon. All I argued was hit rate, although I did comment on scaleability."
Given the principles of locality, I still maintain that a 512KB 8-way set associative WB cache for any CPU would be higher performance than a 256KB 16-way set associative WB cache - and that would include both hit-rate and look-up time. I would love to be able to prove that this would be relevant to the Athlon as well, but I'm afraid you're just going to have to take my word for it. Chances are that the kind of data that can prove my point would not be publicly available.
Conceptually speaking, though, given those two cache configurations, in one of them, you have double the number of storage areas per set, and in the other, you have 4x the number of sets. Given the ideal situation of perfectly randomized data, it should be obvious that having 4x more sets is preferable to having 2x more ways per set. In this case, the 512KB 8-way set associative cache will have twice the hit rate of the 256KB 16-way set associative cache. Given the other ideal of perfectly set-aligned data, the latter cache configuration will have twice the hit rate of the former.
Finally, given realistic data that follows the principles of locality, we know two common truths. The first is that the same data is likely to be called upon a second time in a close subsequent access, and the second is that data close to the first data is likely to be called upon a second time in a close subsequent access. Given these common truths, and given that set-aligned addresses are very far apart in memory, it's unlikely that many subsequent accesses will be set-aligned. Therefore, a 16-way set associative 256KB cache will not have an advantage over an 8-way set associative 512KB cache. In practice, this is exactly the case, so the larger cache - not the higher associative cache - would perform better.
wbmw |