SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD) -- Ignore unavailable to you. Want to Upgrade?


To: graphicsguru who wrote (253017)6/7/2008 11:35:32 PM
From: combjellyRead Replies (2) | Respond to of 275872
 
"I'm inclined to believe that Intel had a whole lot of data to base their choice on. "

They do.

"Do you think they just ignored Penryn's design and experience when they designed the Nehalem memory system?"

The point that mas is trying to make is that CPU architecture is game of trade offs. Not to belabor the obvious, but Penryn isn't Nehalem. The Penryn was designed for fewer max. cores than Nehalem. Not to mention HT. As a result, Penryn was pushed towards maximizing performance for a relatively small number of physical cores. Nehalem was optimized for a larger number of virtual, and for that matter, physical cores. The tradeoffs are almost certainly not the same. Given that Intel likely did not screw up the Penryn caches, it is almost certain that Penryn will beat Nehalem on cache bound, single threaded, or at least small number of threaded, code at the same frequency.

This is a given. If there was a clear win with respect to cache hierarchy, it would have been done a lot time ago. Because that is cheap when compared to things like OOOPS. Much less other techniques. In that respect, it has been decades since anything new has been discovered. Just get one of the simulators, and there are many, and try to find a killer solution. If you do,
you won't have to work for the rest of your life...

mas is arguing from the viewpoint of the science of computer architecture. Sorry to say, but those who are arguing against him are more of the enthusiast viewpoint. The enthusiasts are arguing, in essence, that perpetual motion exists.

Now, one can argue that across the applications that are important, the Nehalem is going to beat Penryn on a per core and at the same frequency. Well, this is a philosophical argument that borders on epistemology. Because that depends on the user and their applications of interest. Things have gotten to the point that we are more or less arguing over corner cases. The set of applications where it makes not a whole lot of difference grows every day.

But that is a different issue.



To: graphicsguru who wrote (253017)6/8/2008 8:35:50 AM
From: mas_Read Replies (2) | Respond to of 275872
 
You only have to look at corner cases. Anything that fits entirely in L1, like some handwritten code/function, will definitely be worse. Anything that fits entirely in Penryn's 6MB L2 (15 cycles) will also be worse as 5.75MB of the equivalent Nehalem cache will be at 39 cycles. In between you would have to get the formulas out to find the break even points from 32KB to 256KB for the L1 and 256KB to 6MB for the L2.

Do you think they just ignored Penryn's design and experience when they designed the Nehalem memory system?

No, just optimised it for its most expensive skus, Server, the same way AMD did for Barcelona. Barcelona beats Penryn in some server apps despite only have a third of the cache at a much slower rate but it definitely has scaled better from dual-core Opteron than Core 2 did and that's mainly due to its 3-level cache structure.