SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD) -- Ignore unavailable to you. Want to Upgrade?


To: TimF who wrote (7204)8/31/2000 6:01:15 PM
From: ScumbriaRead Replies (1) | Respond to of 275872
 
Tim,

The L1 cache is a small part of the overall memory system in an MPU. In a chip like Wilma the L1 serves primarily as a bandwidth-to-latency "transformer" to impedance match the CPU core to the L2 cache and reduce the effective latency of L2 and memory accesses. The big performance loser is going off chip to main memory and the 256 KB L2 in Wilma is what is relevent to that, not the 8 KB dcache. The size of an 8 KB cache is insignificant compared to the scale of the Wilma die and could easily be larger. I think the reason it isn't larger is because Intel wanted to hit a 2 cycle load-use penalty and at the clock rate Wilma targets a larger cache would be a speed path. An 8 KB dcache has a hit rate of around 92% and an 32 KB cache around 96%. A Two cycle 8 KB dcache beats a much larger three cycle dcache for the vast majority of applications given the rest of the Wilma memory system design.

P4 has an 8% L1 miss rate. Athlon has a 1-2% miss rate. P4 misses 4-8X as often as Athlon. This is a major performance hit.

P4 has a 256K L2. Mustang has a 1GB L2. P4 will have to go to DRAM 2-4X as often as Mustang.

DEC calculated the difference between 2-cycle and 3-cycle latency, and the IPC impact was only 3%. Two cycle latency was a big mistake for P4, because of the impact on clock speed.

Scumbria