To: wanna_bmw who wrote (162713 ) 3/21/2002 3:27:15 PM From: Dan3 Read Replies (2) | Respond to of 186894 From your post:The miss ratios were calculated from data collected by functional, user-mode simulations of optimized benchmarks. As a result, the cache miss ratios reported above may not be representative of a real platform. A few sources of error are discussed below. First, only primary misses were counted by the simulator. Once a reference missed in the cache, the data was loaded and all subsequent accesses to the line hit. A modern processor may also experience secondary misses, or references to data that has yet to be loaded from a prior cache miss. There is a nonzero miss latency, and a real processor may execute other instructions while waiting for the data. The sequential model used in functional simulations is optimistic in this respect. Second, a modern processor will have optimizations that affect cache performance. Hardware prefetching of instructions and data can have the positive effect of reducing the number of cache misses. However, prefetching can also cause cache pollution. Further, speculative execution can result in increased memory traffic for speculatively issued loads, and I-cache pollution from incorrect branch predictions. This also makes the results optimistic. Totally invalidating this analysis is the fact that, all other processes, including all the operating system processes, were ignored!Third, the operating system was ignored. System calls cause additional cache misses to bring in OS code and data, and in doing so they replace cache lines from the user program. This increases the number of conflict and capacity misses for the user program in a real system. Since the additional misses from OS intervention were not modeled, our results are optimistic. One possibility is to flush the caches on system calls. However, this is the other extreme, and would have made it impossible to measure the compulsory miss rates. Fourth, all prefetch instructions (loads to R31) were treated as normal references. All were executed, and references from prefetch instructions were included in the overall statistics. Although prefetch instructions may prevent (or reduce the impact of) cache misses from instructions in the original code, the misses still occur (just sooner). However, prefetch instructions increase the overall hit ratio because the subsequent loads and stores that hit in the cache add to the overall hit count. One possibility is to ignore prefetch instructions altogether (the Alpha ISA allows this). Another possibility is to count the misses from the prefetches, but not count them as instructions. Fifth, the benchmarks were optimized for an Alpha 21264 processor. The binaries may have been tuned to perform well with the 21264 cache hierarchy (64K 2-way L1 caches). Ideally, the binary should not favor a particular cache configuration. Further, the binary contains no-ops for alignment and steering of dependant operations in the clustered microarchitecture of the 21264. These no-ops increase the overall instruction count for the functional simulation.