To: fastpathguru who wrote (239547 ) 8/29/2007 5:39:49 PM From: pgerassi Respond to of 275872 Dear Fpg: Sorry, the internal processor state isn't the only thing that needs saving. What if it wrote to memory during the speculation time? What if it passed to another the bad data it speculated? The other processor has to unwind as well. And the memory has to be put back and everything that was wrong has to be righted. The first problem is to get everyone to stop. Then for everyone to rewind to the state before the speculation started. Then and only then can you go forward again. That is not 100 cycles to do that, but 1000 cycles or more and on every core and device in the system. So on a 16 core system, that would mean 16K cycles worth of processing, gone. Because during the unwind, absolutely no speculation would be allowed, everything must be undone perfectly. And the more cycles it saves during times where the speculation is correct, the exponentially worse the unwind gets. So if it saves 160 cycles like you say, a million cycles may be required to reverse it. With HT, that last "hop" costs only 5 to 10ns, worst case, in an 8 way box. 10ns is only 30-40 cycles and that would be about 16K cycles gone. As for nastiness, think what one case where it got it wrong in 3 months of operation and that was detected. The whole line would be unusable for any mission critical server. And many other uses that could handle a rare wrong output, would be also moved off simply because it could happen. And that includes single socket desktops and mobiles. In essence, one known wrong output from this means FDIV in spades. Even after Intel removed any speculation from the cache coherency, it would take years before enough trust would be built back up for those risk adverse server people to be consider Intel again. That is the reason why they will let people bang on a much simpler single and dual socket versions for a whole year, before going beyond that. Any hint where two identical computers running the same software get a different result and it will need to be proven that the cache coherency protocol wasn't at fault. Else the deployment will be delayed. If it gets into the field and just one person can prove that the protocol is at fault and its game over. Intel would have to replace 90-99% of the cores with it, gratis. It would actually be easier, if it failed consistently than the rare intermittent failure. The former could be tested and good CPUs can thus be verified. In the latter case, every CPU is suspect and they all have to be replaced. Its the only way to be sure. Pete