SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD) -- Ignore unavailable to you. Want to Upgrade?


To: fastpathguru who wrote (239547)8/29/2007 5:32:16 PM
From: graphicsguruRead Replies (1) | Respond to of 275872
 
esp. when the processor is sitting around doing nothing but retiring in-flight instructions and waiting for the read to fill

I think you're underestimating the complexity. There may
be relevant writes in flight, and potentially affected
reads in flight on other processors.

Cache coherency is tricky business to begin with.
Adding speculative transactions and making sure that
read/write ordering is *always* correct in all circumstances
with partial and re-ordered transactions in flight,
is anything but simple.



To: fastpathguru who wrote (239547)8/29/2007 5:39:49 PM
From: pgerassiRespond to of 275872
 
Dear Fpg:

Sorry, the internal processor state isn't the only thing that needs saving. What if it wrote to memory during the speculation time? What if it passed to another the bad data it speculated? The other processor has to unwind as well. And the memory has to be put back and everything that was wrong has to be righted. The first problem is to get everyone to stop. Then for everyone to rewind to the state before the speculation started. Then and only then can you go forward again.

That is not 100 cycles to do that, but 1000 cycles or more and on every core and device in the system. So on a 16 core system, that would mean 16K cycles worth of processing, gone. Because during the unwind, absolutely no speculation would be allowed, everything must be undone perfectly.

And the more cycles it saves during times where the speculation is correct, the exponentially worse the unwind gets. So if it saves 160 cycles like you say, a million cycles may be required to reverse it. With HT, that last "hop" costs only 5 to 10ns, worst case, in an 8 way box. 10ns is only 30-40 cycles and that would be about 16K cycles gone.

As for nastiness, think what one case where it got it wrong in 3 months of operation and that was detected. The whole line would be unusable for any mission critical server. And many other uses that could handle a rare wrong output, would be also moved off simply because it could happen. And that includes single socket desktops and mobiles. In essence, one known wrong output from this means FDIV in spades. Even after Intel removed any speculation from the cache coherency, it would take years before enough trust would be built back up for those risk adverse server people to be consider Intel again.

That is the reason why they will let people bang on a much simpler single and dual socket versions for a whole year, before going beyond that. Any hint where two identical computers running the same software get a different result and it will need to be proven that the cache coherency protocol wasn't at fault. Else the deployment will be delayed. If it gets into the field and just one person can prove that the protocol is at fault and its game over. Intel would have to replace 90-99% of the cores with it, gratis. It would actually be easier, if it failed consistently than the rare intermittent failure. The former could be tested and good CPUs can thus be verified. In the latter case, every CPU is suspect and they all have to be replaced. Its the only way to be sure.

Pete