SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD) -- Ignore unavailable to you. Want to Upgrade?


To: graphicsguru who wrote (239551)8/30/2007 12:42:40 AM
From: fastpathguruRead Replies (2) | Respond to of 275872
 
I think you're underestimating the complexity. There may
be relevant writes in flight, and potentially affected
reads in flight on other processors.


I think you're overestimating Intel's implementation. :)

There are going to be many ways to skin the cat, some coarser than others. Results dependent upon speculative loads will need to be fenced in at some perimeter, a perimeter determined by the rollback strategy. Wider perimeters will demand more complex rollback strategies. No doubt, the gains fall off rapidly and the implementation complexity increases exponentially the deeper down the rabbit hole you go.

Cache coherency is tricky business to begin with.
Adding speculative transactions and making sure that
read/write ordering is *always* correct in all circumstances
with partial and re-ordered transactions in flight,
is anything but simple.


That's why I think that Intel's solution will be relatively simple, and while it could be an improvement in that cores will block later and less frequently than AMD's current protocol, they will indeed block relatively coarsely to prevent results that depend upon speculative data from propagating very far.

All I've done is propose a rollback mechanism consisting of

A) a way to checkpoint a core's state, and
B) by blocking on or buffering writes,

such that upon discovering that the cache data processed after the checkpoint was stale or otherwise crappy, a core could flush and restart from the checkpoint. (Remember, the go/no-go signal for committing the speculative results will arrive only a few ns after the actual data, the core simply can't get very far before knowing whether it's efforts are indeed valid, or it needs to flush/rollback.)

If you don't think the above would work, be specific, don't patronize me with "it's more complex than you think.")

I'm not saying that's the only way it can be done, but I'm betting Intel's rollback implementation is not much more complex than that, because, as you alluded, the rabbit hole gets VERY steep and slippery beyond that.

Now, this rollback mechanism is just a basic core-level tool, the next layer up is how many instances of rollback can be live in an MP system, and how they're managed...

fpg