SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD) -- Ignore unavailable to you. Want to Upgrade?


To: fastpathguru who wrote (239510)8/29/2007 1:01:51 AM
From: PetzRespond to of 275872
 
fpg, thanks, I missed AMD's latency reduction feature. /Petz



To: fastpathguru who wrote (239510)8/29/2007 2:36:24 AM
From: graphicsguruRead Replies (2) | Respond to of 275872
 
No, Barcelona doesn't have the same feature. It has a different feature.

A Barcelona core (A) with a cache miss still communicates first
with the core (B) whose memory controller owns the memory in
question. That core then speaks to the others (C, D, E . . .)

The (small) optimization is that if C, D, E . . . get back to A with
their results before B, and the cache line is in the M or O state,
then core A in Barcelona realizes that it knows what B is going to
say, so it doesn't bother to wait. K8 would have waited.

The reason it's a small optimization is that it is an unusual
situation that would make B be the slow one in getting back to A.
After all, B knows what it wants to say before C, D, E do. So usually
it will be the first one to answer. It will be last, only when there's a
particularly heavy load on the B->A hypertransport link, or when it
is more hops away than the others.

The fact that it's a small optimization is probably why it was not included
in K8. Or maybe it gets to be more significant with more cores, and
that's the reason they didn't bother with it on K8.

The Intel CSI scheme is very different and much more complicated.
In th AMD scheme, serialization happens always at the node that owns
the memory. In the Intel scheme, that's not true. So the Intel scheme
can require unwinding transactions occasionally to guarantee proper
ordering.

The Intel schem is much more complex, but can potentially yield
significant performance advantages.