SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD)
AMD 219.81+2.6%12:58 PM EST

 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext  
To: EricRR who wrote (7182)8/31/2000 3:53:05 PM
From: ScumbriaRead Replies (1) of 275872
 
Ratbert,

You say that the small cache is bad because it will cause misses, while the small 2 clock latency will limit future clock speed.

Paul claims that the small cache, versus a larger one, won't limit clock speed because small caches have shorter speedpaths. If I understand correctly, you claim that a 3 clock L1 latency isn't so bad because the requests can be pipelined. I assume then that only the L1 can be pipelined, because every mem access is assumed to be there, is this right? Also is the pipelining of the cache a difficult this to do- can the request for data be made 3 clocks before the data is needed in a register? Or does that require compiler support, like prefetching?


Your summary is pretty accurate. Larger caches tend to have more delay, which is why Dirk added the third cycle to the L1 access. Because the accesses are pipelined, there is almost no penalty to adding the third cycle. (DEC once calculated a 3% penalty.) The advantage of adding the third cycle, is that it gives you a lot of additional headroom for MHz.

The L2 can be pipelined as well, but there are still bubbles introduced, because the ALU needs the data after 2 cycles.

Demone's argument about "The big performance loser is going off chip to main memory and the 256 KB L2 in Wilma is what is relevent to that, not the 8 KB dcache." is rather bogus. The L1 cache on Athlon has about 1% misses, compared to 8% on Willy.

8X as many L1 cache misses on Willy, and each one has a significant penalty associated. It is silly to underestimate the significance of this. He is trying to justify an unsupportable position.

Scumbria
Report TOU ViolationShare This Post
 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext