To: combjelly who wrote (253019 ) 6/8/2008 4:23:42 PM From: wbmw Respond to of 275872 Re: The point that mas is trying to make is that CPU architecture is game of trade offs. Not to belabor the obvious, but Penryn isn't Nehalem. The Penryn was designed for fewer max. cores than Nehalem. Not to mention HT. As a result, Penryn was pushed towards maximizing performance for a relatively small number of physical cores. Nehalem was optimized for a larger number of virtual, and for that matter, physical cores. The tradeoffs are almost certainly not the same. Given that Intel likely did not screw up the Penryn caches, it is almost certain that Penryn will beat Nehalem on cache bound, single threaded, or at least small number of threaded, code at the same frequency. There's no reason the architecture of Nehalem couldn't have been designed to improve both multi-threaded *and* single threaded performance. I recall seeing in the IDF slides a number of features targeted at single threaded performance. Real World Tech covers them here:realworldtech.com The multithreaded enhancements just happen to have a greater effect on overall performance. If enhancing the multithreaded performance requires tradeoffs in single threaded performance - such as longer latencies or smaller capacities on the low level caches - this performance can still be made up elsewhere. Focusing on just the caches by themselves ignores the fact that modern processors are complex systems with strings of dependencies and tradeoffs on the various components. For example, in order to improve the characteristics of Unit A, it may require that you relax the timing of Unit B. Then, if someone were to argue about Unit B causing a performance loss, they would ignore that the changes in Unit B came as a result in improving the characteristics of Unit A, which could result in a net improvement across the relevant workloads. As for what "relevant workloads" mean, you may have a point that it depends on what applications you care about, but in general it's safe to say that mainstream computing has moved away from applications that are strictly cache-bound, or for that matter strictly favorable to any single kind of micro-architectural feature. I would disagree that it's a world of corner cases, because I think it's more accurate to say that the world is becoming more homogenized, with real performance coming from strengths in all areas of the chip. Optimizing for "just bandwidth", or "just multithreading", or "just caches" will result in a machine that does poorly in most workloads. You actually need to make the best tradeoffs across all units, such that the micro-architecture is as balanced as possible, and power efficient besides. I think Nehalem is an example of a finely tuned architecture that started with a rather high performance core, and improved upon it by addressing its weaknesses, and further polishing its strengths. Even on Anand's untuned motherboard with poor memory performance, he still showed a part capable of outperforming the previous generation by very healthy margins across a decent spectrum of workloads. And even in the one single threaded benchmark (Cinebench with one thread enabled), it still managed to outperform Penryn by about 3%. So clearly, your initial statement has been proven to be incorrect (that Penryn will be almost certainly faster in this kind of workloads). Moreover, the performance is likely to improve with production worthy systems. I think early Nehalem benchmarks demonstrate a clear improvement over the previous generation, but the results are still unclear. If Anand is right about memory performance improving with production worthy boards, we may see significant upside on what is already some very impressive results. To argue against this is premature, and there really isn't a lot of data for you, or mas, or anyone else to proclaim anything as obvious or certain. For most end-users, I don't think these arguments are going to mean much. All they care about is overall performance, and if most applications scale with the kind of improvements that Nehalem has, then Intel scored a home run on this micro-architecture. That, IMO, is the bottom line.