Silicon Investor (SI) -- The First Internet Community

STOCKTALK

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor. We ask that you disable ad blocking while on Silicon Investor in the best interests of our community. If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.

Technology Stocks : Advanced Micro Devices - Moderated (AMD) -- Ignore unavailable to you. Want to Upgrade?

To: fastpathguru who wrote (239510)	8/29/2007 1:01:51 AM
From: Petz	Respond to of 275872

fpg, thanks, I missed AMD's latency reduction feature. /Petz

To: fastpathguru who wrote (239510)	8/29/2007 2:36:24 AM
From: graphicsguru	Read Replies (2) \| Respond to of 275872

No, Barcelona doesn't have the same feature. It has a different feature. A Barcelona core (A) with a cache miss still communicates first with the core (B) whose memory controller owns the memory in question. That core then speaks to the others (C, D, E . . .) The (small) optimization is that if C, D, E . . . get back to A with their results before B, and the cache line is in the M or O state, then core A in Barcelona realizes that it knows what B is going to say, so it doesn't bother to wait. K8 would have waited. The reason it's a small optimization is that it is an unusual situation that would make B be the slow one in getting back to A. After all, B knows what it wants to say before C, D, E do. So usually it will be the first one to answer. It will be last, only when there's a particularly heavy load on the B->A hypertransport link, or when it is more hops away than the others. The fact that it's a small optimization is probably why it was not included in K8. Or maybe it gets to be more significant with more cores, and that's the reason they didn't bother with it on K8. The Intel CSI scheme is very different and much more complicated. In th AMD scheme, serialization happens always at the node that owns the memory. In the Intel scheme, that's not true. So the Intel scheme can require unwinding transactions occasionally to guarantee proper ordering. The Intel schem is much more complex, but can potentially yield significant performance advantages.