M.,
Most of the major processor manufacturers I know of have a group (or more) doing performance simulations, however , it would not surprise me if some folks didn't and designed by gut, but they would not last very long in the performance wars.
Everybody uses performance simulators, but the typical performance simulator uses a trace driven methodology which is incapable of measuring the real time effects of data interaction. i.e. instruction hazards between execution units, data hazards in the caches, etc. It has been proven repeatedly (by some of the largest computer companies in the world) that this methodology is not very accurate. In order to do performance analysis properly you need a cycle accurate simulator.
Performance monitoring features (like those on PII) are a tremendous help to CPU architects. Unfortunately the data is only valid for that particular architecture, and is of limited value when moving to a new architecture.
if the hit rate to the cache increases more bytes per seond are transferred from the cache because cache is a faster memory than the levels of mem hierarchies further down.
That is true, but there is no point building execution units which tax the cache beyond it's theoretical maximum bandwidth. Most L1 caches operate pretty close to their peak theoretical bandwidth anyway (i.e. hit rates > 95%.)
Increasing the cache size can have a significant impact on latency, but has little impact on bandwidth. Bandwidth is a function of hit rate, and going from 95% hits to 97% hits is a negligible change.
In contrast, latency is a function of miss rate. Reducing the miss rate from 5% to 3% implies a large latency reduction. That is why Intel saw a big jump in performance between Pentium and Pentium MMX, and very little per cycle improvement since with PII and PIII.
Scumbria |