To: dougSF30 who wrote (253085 ) 6/9/2008 8:10:37 PM From: pgerassi Read Replies (2) | Respond to of 275872 Doug: Its BS when all mas said was that for single thread code, Nehalem will be slower than Penryn. And whereas you can always find corner cases going either way, the bulk will be against Nehalem vs Penryn assuming no other major changes. But major changes are not advisable when changing the infrastructure so much. Piling change after change makes it far more likely to screw it all up. And while you are fixing things, it will slip and slip some more. No, its far more likely they had to make a change since sharing between four cores reduces the usable ways than sharing between two of them. And to negate that change they had to go to a 4 cycle L1 and shrink the L2 to lower its latency to make up for some of the loss. L3 had to be much changed because its shared between four cores and not two. Hyperthreading also causes its changes as well. It is the likely reason for the increase in L1 latency. The thing about hyperthreading is that it works best when the primary thread waits a lot and/or concentrates on just one kind of work. The trouble is when you push one thread and make it work hard, there isn't much available needed resources for a second thread. The typical bottleneck is decode power. This was proven in the P4 generation. Even when you go past that by whatever means, the next is L1 cache pollution. Typically programs are composed of separately linked functions. The problem is that the functions tend to start on nice binary boundries and as such find the meat (most used code) of the instructions in the same associative sets of the cache. Thus one thread stomps on the other. After you deal with those two things and their associated issues, you get a major amount of extra transistors per core. The extra performance gained is more or less than the number of extra cores could do using them that way. So we get two views. Intel's way thinking is that making the core fatter is better for overall performance and AMD's way, that more normal cores is the better way to higher performance. Although that has been recently modified to additional specialized cores for the intended use. Even you said virtually everything. Well what if the benchmark is completely within L1 which happened a lot during the P4 era. There Nehalem will be slower than Penryn. Or how about code which consists of mostly pointer chasing in a medium area say 4.5MB? As for your silly assertion that by wrongly paraphrasing what mas said, you blow your arguments out of the water. For your silly claim that Nehalem will outperform Penryn on virtually everything, all one has to do is find one case where Penryn is faster than Nehalem clock for clock. Heck even if Nehalem was ten times faster than Penryn on a clock to clock basis overall, I can write code in which Penryn beats Nehalem badly. And if Intel did say single thread performance has improved over Penryn, I haven't found that quote. This link looked at Nehalem versus Penryn and found Anandtech's review to be flawed:scientiasblog.blogspot.com Given the above, your assertion fails as there appears to be some problem in the benchmarking. Penryn scored higher in older references and was 9% faster than the Nehalem's score in the review. So the only sing;le thread benchmarks shows Nehalem slower clock for clock than Penryn. Your silly assertion is thus proven FALSE! Looking at the cache differences could easily explain Nehalem being 9% slower than Penryn. And that any "IPC" improvements will likely be strictly due to the ODMC and higher BW on code that makes use of them. Pete