Another example of "Impact and Cascading Effect of failure.
eetimes.com Experts mull potential domino effect of system failures By Stan Runyon and Craig Matsumoto EE Times (07/09/99, 4:06 p.m. EDT) NEW YORK — Are our systems reliable? Given the pervasive dependence on electronic systems packed with devices too complex to test down to each transistor, it's a reasonable, if provocative, question. Consider the case of the chip that could have brought down the Internet. It happened at a New York Internet point-of-presence — a room stuffed with dozens of network routers. One chip burned out on one board; an engineer put the fire out without incident. But the smoke blown from cooling fans in the routers began drifting into the room and curling up toward the smoke alarms. Because automatic fire-suppression systems cannot use halogen chemicals, the room was equipped with sprinkler systems. Had the smoke been sufficient to set off the alarms and trigger the sprinklers, "it would have taken out every box in the building. It would have taken down the entire U.S. Internet," said engineer Hugh Duffy at Failure Analysis Associates Inc., which investigated the mishap. The intertwining of systems of all sorts calls for consideration of the ripple effect of any given change or failure, Duffy warned. "It used to be that if a board failed, O.K., so your TV didn't work anymore," he said. But increasingly, "you have to walk your way through all the consequences of [your] decisions." Some experts, including Duffy himself, cite credible evidence that systems are becoming more reliable relative to their complexity. While acknowledging that systems-on-chip represent a quantum leap in design intricacy, they note that fewer blocks are being connected to the outside — and it is in the interconnections, they argue, that physical problems most often surface. Failures decline "The 'terrible truth' is that failure rates are going down, not up," Duffy said. "People got more experienced at making chips, so they are more reliable." But the world population's increasing reliance on systems — and the systems' increasing reliance on one another — breeds vulnerability. "With the rising complexity of global systems such as the Internet and power grids, the threat and impact of failures is increasing," warned Donald A. Norman, a consultant and author of numerous books on design. "We are getting to the point where we will see complex systems problems the likes of which we have never seen before, and we lack the scientific background to understand them." Indeed, experts say it is becoming increasingly difficult to gauge the reliability of large-scale systems. The Web, for example, defies analysis because it is a hybrid of the traditional circuit-switched telephone network and today's emerging data, optical and cable nets — a complex system of interrelated systems. The Asian flu erased all doubt that global economies are interlocked. But beyond economic institutions, technology itself has intertwined the nations of the world in an interdependent web of critical technologies. So just how fragile is that web? What would it take to "take down" the planet or a particular portion of its critical enterprises? "Failure is a normal part of any human-made system, a part of life," said Norman. "The human is part of the system. That's not a novel concept, but it's still novel in many product-development cycles. "I hear it from many EEs: They are working on something that they say is at such a low level that it doesn't impact anyone. As long as [their subsystem] works perfectly, their assumption is OK," Norman said. "But what happens when it fails?" Norman, a former head of Apple's Advanced Technology Group, sits on the U.S. Government's Computer Science Telecommunications Board, which reports on safety and reliability. The board's object is to address growing concerns over national security, especially the exposure of electronic systems to failure by accident or tampering. "We can put out new computers faster than we can develop security for them," "snip" J.L.T. |