To: EricRR who wrote (7182 ) 8/31/2000 3:53:05 PM From: Scumbria Read Replies (1) | Respond to of 275872 Ratbert,You say that the small cache is bad because it will cause misses, while the small 2 clock latency will limit future clock speed. Paul claims that the small cache, versus a larger one, won't limit clock speed because small caches have shorter speedpaths. If I understand correctly, you claim that a 3 clock L1 latency isn't so bad because the requests can be pipelined. I assume then that only the L1 can be pipelined, because every mem access is assumed to be there, is this right? Also is the pipelining of the cache a difficult this to do- can the request for data be made 3 clocks before the data is needed in a register? Or does that require compiler support, like prefetching? Your summary is pretty accurate. Larger caches tend to have more delay, which is why Dirk added the third cycle to the L1 access. Because the accesses are pipelined, there is almost no penalty to adding the third cycle. (DEC once calculated a 3% penalty.) The advantage of adding the third cycle, is that it gives you a lot of additional headroom for MHz. The L2 can be pipelined as well, but there are still bubbles introduced, because the ALU needs the data after 2 cycles. Demone's argument about "The big performance loser is going off chip to main memory and the 256 KB L2 in Wilma is what is relevent to that, not the 8 KB dcache." is rather bogus. The L1 cache on Athlon has about 1% misses, compared to 8% on Willy. 8X as many L1 cache misses on Willy, and each one has a significant penalty associated. It is silly to underestimate the significance of this. He is trying to justify an unsupportable position. Scumbria