SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Intel Corporation (INTC)
INTC 37.24-2.8%Nov 6 3:59 PM EST

 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext  
To: Amy J who wrote (83668)6/17/1999 10:44:00 AM
From: Tony Viola  Read Replies (2) of 186894
 
What is the net of a soft error, i.e. what are the symptoms from the standpoint of an
end-user?


Amy, depends on a lot of things. If the error occurs in main memory, ECC, (error checking and correction), if you have it, should catch it and correct it. Caveat is that if two or more bits go bad in one word, or whatever boundary the ECC is designed to cover, the system will report an error but not be able to correct. It will stop. I don't know if Intel has ECC on caches, TLBs, microcode. System 390s do. Finally, the last area to worry about on chips, if the alpha or other particle hits a critical logic latch, it's probably sayonara, since it's so hard to predict what combinations of latches to cover with ECC, so computer designs don't try, to my knowledge. Well, sayonara meaning the system probably goes down the wrong path and ultimately stops, or gets wrong answers. However, no soft error should ever cause a machine to fail "hard" and have to be repaired. That's because the soft errors usually don't repeat since they're caused by a random hit from a particle. However, if you could determine that a component (CPU, memory chip) was constantly getting soft errors because of a weakness, it could be replaced.

So, like Elmer says, chips have been going through this susceptibility to alpha and other particles for years, ever since a latch got small enough for an alpha strike to impart enough energy to flip it. Used to be the main culprit was particles from the atmosphere, or space. Then, chip mfrs put as much shielding as they could afford to, based on materials cost, size, etc. to stop the particles from crashing through to the silicon. Then, when outside sources got adequately shielded out, the packages themselves became the main culprit, as everything in the universe is breaking down and emitting alphas all the time.

Bottom line, Intel does need to keep on top of this, it's really not new, smaller and smaller geometries continue to uncover new sources of the problem. You can throw enough ECC at the problem to keep it under control, but the best fix is to eliminate the root cause. Latest demon to be corralled where I have on-hands knowledge is the neutron particle.

Tony
Report TOU ViolationShare This Post
 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext