SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD) -- Ignore unavailable to you. Want to Upgrade?


To: Saturn V who wrote (52744)8/28/2001 1:38:00 PM
From: TenchusatsuRespond to of 275872
 
Saturn, <The difference in the speed of L2 and L3 is not enough to warrant a new level of cache, and it makes sense to increase L2 only. And I do not know why Merced used a L3.>

What is the latency difference between L2 and L3 if both are on-die? Assume your standard 4:1 rule-of-thumb.

As for Merced, that processor only has a 32K L1 (16K data plus 16K code) and a 96K L2 on-die. The need for an off-chip L3 is obvious. (Of course, the obvious question would be why Merced didn't just combine all of its on-die cache into 128K of L1 like Athlon. I'm sure you'll understand if I dodge that one.)

Tenchusatsu



To: Saturn V who wrote (52744)8/28/2001 7:19:56 PM
From: Dan3Read Replies (1) | Respond to of 275872
 
Re: The miss rate of the cache is highly bench mark specific. However a "general rule of thumb "

More important, really, is the size of the set associativity (the "wayness" of the cache).

Just as AMD, for whatever reason, hasn't shipped any chips with on-die cache larger than 384K (L1+L2), Intel hasn't been able to make a cache that supports more than 8 addresses for a given LSB. They do have some Harvard architecture L1s that get to 4-way + 8-way for very small caches, but Intel has never managed to produce anything approaching AMD's 16-way L2.

This is one of the main reasons that Athlon so thoroughly trounces P4 in real programs that operate on real data (as opposed to the small, predictable sets of "ideal" data used in benchmarks).

Do you realize how lame Itanium's huge, die eating 4-way L3 is? Itanium has a 4 meg L3 that can overflow after reading as few as 5 words from memory! 4 reads and it can be full to capacity! On the fifth word, part of the cache may need to be flushed and overwritten!