To: Scumbria who wrote (81908 ) 6/2/1999 2:47:00 PM From: Saturn V Read Replies (1) | Respond to of 186894
Ref- <PowerPC has a cache touch instruction, which doesn't get used much because the penalty for incorrectly speculatively prefetching load misses is too great.> Sorry for responding late since I was out out a mini-vacation attending a wedding and family reunions. A cache touch instruction with L1 would indeed suffer from the problem you have described. However for the P6 family, a cache miss on L1 does extract a small penalty of 2-5 cycles, but a miss on L2 extracts a penalty of 30-70 cycles. So if on every instruction in which there is no cache miss, the L2 was automatically loaded with the contiguous pages, the incidence of L2 cache miss would be minimal. Since the L2 can be pretty large, the problem of cache thrashing which you alluded to, is minimal. This strategy would minimize the penalty of L2 misses and could be implemented without requiring recompilation of existing x86 code.This would offer significant performance enhancement since most data sets are addressed sequentially. Alas Intel SSE does not do this as yet ! It requires explicit instructions to preload L2, which requires recompilation. Without L2 pre loading , CPU performance is determined less by the number of Integer and Floating point units and MHZ, but more so , by the time needed to load L2 from the main memory. Without preloading, the performance is significantly affected by memory latency, but with preloading it is mostly affected by main memory bandwith. I notice that the thread has been consumed by the RAMBUS, and Camino issues which is appropriate, since the main memory architecture will have a greater impact on system performance.