SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Intel Corporation (INTC) -- Ignore unavailable to you. Want to Upgrade?


To: Scumbria who wrote (81908)6/2/1999 2:47:00 PM
From: Saturn V  Read Replies (1) | Respond to of 186894
 
Ref- <PowerPC has a cache touch instruction, which doesn't get used much because the penalty for incorrectly speculatively prefetching load misses is too great.>

Sorry for responding late since I was out out a mini-vacation attending a wedding and family reunions.

A cache touch instruction with L1 would indeed suffer from the problem you have described. However for the P6 family, a cache miss on L1 does extract a small penalty of 2-5 cycles, but a miss on L2 extracts a penalty of 30-70 cycles. So if on every instruction in which there is no cache miss, the L2 was automatically loaded with the contiguous pages, the incidence of L2 cache miss would be minimal. Since the L2 can be pretty large, the problem of cache thrashing which you alluded to, is minimal. This strategy would minimize the penalty of L2 misses and could be implemented without requiring recompilation of existing x86 code.This would offer significant performance enhancement since most data sets are addressed sequentially.

Alas Intel SSE does not do this as yet ! It requires explicit instructions to preload L2, which requires recompilation.

Without L2 pre loading , CPU performance is determined less by the number of Integer and Floating point units and MHZ, but more so , by the time needed to load L2 from the main memory. Without preloading, the performance is significantly affected by memory latency, but with preloading it is mostly affected by main memory bandwith.

I notice that the thread has been consumed by the RAMBUS, and Camino issues which is appropriate, since the main memory architecture will have a greater impact on system performance.