SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD) -- Ignore unavailable to you. Want to Upgrade?


To: graphicsguru who wrote (239560)8/30/2007 8:29:53 AM
From: combjellyRead Replies (1) | Respond to of 275872
 
"But here's the type of crazy edge case that has to work correctly."

This is what I meant by "corner case". Yes, such a situation needs to be handled correctly, but this is a low probability situation. CSI is supposed to scale to thousands of sockets in a hierarchal topology. Access times between any two arbitrary sockets can be a large number of hops. In other words, large CSI systems are NUMA.

So optimizing for sharing code and/or data between any two cores doesn't make a huge amount of sense. Because the mechanics of a NUMA system is that sharing code and/or data between cores is discouraged outside of the local cluster. So it doesn't make sense to go to heroic lengths to unwind the state of an arbitrary large number of cores. It does make sense to detect the situation and then stall while draining the pipeline, flagging and invalidating pending writes, etc.

Because this situation should be very well. If performance sucks, well, performance is going to suck any way. It may be that throwing a huge number of transistors at the problem is going to make it suck with somewhat less vigor, but it is likely that those transistors are better spent on something else to optimize local access.

Like the perennial favorite, more cache.