SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Intel Corporation (INTC)
INTC 41.41+2.2%Dec 5 9:30 AM EST

 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext  
To: Tony Viola who wrote (162707)3/21/2002 2:37:11 PM
From: wanna_bmw  Read Replies (3) of 186894
 
Tony, Re: "doubling the set associativity makes up for having half the L2 memory size? If so, why doesn't everyone go that way and save all that memory space? The logic for 16 way associativity (over 8 way) can't be anywhere near the size of another 512 KB of L2."

It's a common misperception, usually perpetuated by people like Dan3, that the cache associativity of the Athlon cache gives hit a huge advantage over caches from Intel with lower levels of associativity. It's quite clear that clowns like Dan3 have no concept what-so-ever of cache locality, so they don't realize in what ways it can affect hit-rate. Therefore, they'll make guesses based on their preconceived notions, and after hearing it enough times, they'll believe it as gospel.

Check out some of these links.

"General Rule #1 - 8-way is generally as good as Full Assoc. for removing misses"

cs.uic.edu

That is the "general rule", but you can see data to back it up here.

cs.wisc.edu

--------------------------------------------------------------------------
| Block size: 64 bytes, Repl: LRU |
|------------------------------------------------------------------------|
| Arithmetic Mean for Instruction References |
|------------------------------------------------------------------------|
| | Associativity |
| Size |----------------------------------------------------------------|
| | 1 | 2 | 4 | 8 | full |
|-------+------------+------------+------------+------------+------------|
| 1K | 0.040115-- | 0.038059-- | 0.038609-- | 0.038631-- | 0.038770-- |
| 2K | 0.028248-- | 0.026708-- | 0.026033-- | 0.026023-- | 0.026006-- |
| 4K | 0.019655-- | 0.017775-- | 0.017586-- | 0.017514-- | 0.017421-- |
| 8K | 0.013024-- | 0.011229-- | 0.010171-- | 0.010013-- | 0.009931-- |
| 16K | 0.007394-- | 0.004766-- | 0.003666-- | 0.003405-- | 0.004296-- |
| 32K | 0.003237-- | 0.001233-- | 0.000651-- | 0.000388-- | 0.000239-- |
| 64K | 0.001060-- | 0.000360-- | 0.000127-- | 0.000049-- | 0.000016-- |
| 128K | 0.000454-- | 0.000148-- | 0.000014-- | 0.000004-- | 0.000002-- |
| 256K | 0.000090-- | 0.000011-- | 0.000002-- | 0.000001-- | 0.000001-- |
| 512K | 0.000009-- | 0.000003-- | 0.000001-- | 0.000000-- | 0.000001-- |
| 1024K | 0.000000-- | 0.000000-- | 0.000000-- | 0.000000-- | 0.000000-- |
--------------------------------------------------------------------------
Compulsory: 0.0000000416--

--------------------------------------------------------------------------
| Block size: 64 bytes, Repl: LRU |
|------------------------------------------------------------------------|
| Arithmetic Mean for Data References |
|------------------------------------------------------------------------|
| | Associativity |
| Size |----------------------------------------------------------------|
| | 1 | 2 | 4 | 8 | full |
|-------+------------+------------+------------+------------+------------|
| 1K | 0.275311-- | 0.232072-- | 0.207868-- | 0.191097-- | 0.185660-- |
| 2K | 0.191787-- | 0.155995-- | 0.137516-- | 0.123602-- | 0.115772-- |
| 4K | 0.145548-- | 0.114026-- | 0.105337-- | 0.094777-- | 0.089413-- |
| 8K | 0.106719-- | 0.085133-- | 0.078486-- | 0.074013-- | 0.069963-- |
| 16K | 0.082798-- | 0.067679-- | 0.064007-- | 0.061553-- | 0.059314-- |
| 32K | 0.069504-- | 0.056942-- | 0.055286-- | 0.053659-- | 0.052217-- |
| 64K | 0.060102-- | 0.052060-- | 0.050989-- | 0.049836-- | 0.048541-- |
| 128K | 0.051134-- | 0.048766-- | 0.048341-- | 0.046895-- | 0.045834-- |
| 256K | 0.046695-- | 0.044774-- | 0.044566-- | 0.044497-- | 0.043546-- |
| 512K | 0.041238-- | 0.040808-- | 0.041690-- | 0.041878-- | 0.040885-- |
| 1024K | 0.033697-- | 0.032618-- | 0.033644-- | 0.034391-- | 0.034436-- |
--------------------------------------------------------------------------
Compulsory: 0.0000293378--


"Full Associativity" means that the "way-ness" of a cache is equal to its size. Thus, an 8KB fully associative cache will have 8192-ways. As you can see, dramatically increasing the number of ways past 8 does very little to improve performance, and in some cases, performance will *decrease* as a result of overhead. The surest path to better performance, as you can see from the tables, is to increase the size of the cache - not the associativity.

I'm sure the Athlon designers aren't idiots, so I'll assume that their design decision is a result of incremental gains from adding some additional ways of associativity - and this is probably because manufacturing larger caches has never been a strong point for AMD, so they might as well increase associativity. For Intel, though, I suspect that their simulations tell them that 8-way is the "sweet spot", and it probably gives them the access times that they need, as well as the performance.

It always cracks me up with some AMDroid decides to promote themselves to the title of Armchair Expert (tm Tenchusatsu), while criticizing the design decisions of those who have infinitely more education and experience than they do. Designers are not idiots. The only idiots here are people that perpetuate rumors like "8-way caches are insufficient" or "P4 caches only run at half speed". Few people here that claim to understand this stuff really do, so I would take most comments, even those from Dale LaRoy, with a grain of salt.

wbmw
Report TOU ViolationShare This Post
 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext