FAULT TOLERANCE: The KEY to Survival (a final note to Ron: until you have a valid, logical basis for ascertaining the fault tolerance level of our infrastucture and society, you(nor anyone else) have absolutely NO intelligent, logical basis for rejection of, much less condeming of any, as you label it, extremist/doomer articles/discussions. I suggest you in particular, carefully read and fully understand the following article: you will learn far, far more about the bottom-line essence of the REAL Y2K problem than, from your writings, you have learned to date. Understanding these concepts will be of an infinitely greater value to you than discussing individual software senarios and reports. May you gain in wisdom by doing so! (EOD with you). <<Here is a very intelligent essay. It's more than intelligent; it's on target. This is the heart of the y2k matter: fault tolerance. This is the heart of y2k's social implications. Can the interdependence of the worldwide economy survive a simultaneous failure? For localized crises, such as a famine (rare in the free world) or an earthquake, interdependence helps. Victims get help from outside the crisis zone. But what happens when the same crisis hits everyone? That's why y2k is like the bubonic plage of 1348-50.
Pareto's 80-20 law is against us. This time, 80% isn't good enough. If GM gets only 80% of its needed supplies, it goes out of business. Same with your local power utility.
We don't know what the fault tolerance is. We also don't know what the rate of failure will be in key industries. I think it will be over 1%. Others disagree. But this is where the debate should begin. This is where it rarely even ends, let alone begins.
* * * * * * * * * *
THE REAL Y2K QUESTION
[essay from Mike Goodin]
I've been pondering the root Y2K problem for many years, searching for a concise way to describe the true nature of the potential threat. This week, aided by the phraseology of a scientist, I've constructed this question:
"What is the fault tolerance of our globally-distributed specialization network?"
This is the relevant Y2K question. Remember, it's not the compliance of home appliances that matter ( and why polls keep asking people about home appliances is an unfortunate mystery... ) , and the likelihood of failures somewhere on the planet are all but certain. Failures are going to occur, without a doubt.
The question concerns the ability of our globally-distributed specialization network to survive faults. If the global system is highly fault tolerant, it will survive intact, with few disruptions. If the global system has low fault tolerance, we're in for a very rough ride. Perhaps even a multi-year shutdown of civilization as we know it.
FAULT TOLERANCE HAS NEVER BEEN TESTED
Recognize the fault tolerance of our "new" global community has never been tested. In the days of World War II, America was relatively isolated. We could build our own planes, trains and automobiles ( tanks, too ) . We had factories, we had relatively short, U.S.-based assembly lines with skilled U.S.-based workers who possessed labor skills. The network of specialization was much smaller, and therefore, more fault tolerant. Everybody knows the fewer pieces you have in an engine, the less likely it is to fail. Simplicity leads to reliability. Complexity results in a low fault tolerance.
Today, the manufacturing base of America is nearly extinct, and the supply lines for building products stretch across oceans, involving a half-dozen countries for parts. This is the "globally-distributed" specialization network to which I refer, and it is a relatively young system.
It's been driven by economics, by specialization, by efficient ocean-going transports and air deliveries. It's enabled by international telecommunications: e-mails, faxes, phone calls, even video conferencing. International banks allow the moving of funds from buyer to seller, through trusted international clearinghouse networks. This is, indeed, a "network" of a thousand parts, and each part of the machine must work at near-perfect efficiency for the whole system operate correctly.
WE ALREADY KNOW THE SYSTEM CAN HANDLE A 1% FAILURE RATE
So what is the fault tolerance of this system anyway? That's the debate, that's the big question. Clearly, the people who say that systems fail all the time -- with no big deal -- are missing the point. Yes, power plants fail on a daily basis. Phone lines go down somewhere on the planet on a daily basis. Banks mess up transactions with frightening regularity. We understand that this global network has a fault tolerance of at least 1%. But that's not the right question. Y2K isn't a local hurricane. It isn't a local power outage or a local bank error. It's a simultaneous, global slam-dunk event. It may raise the failure rate of this network to 10%. And *that* is the big question: is our globally-distributed specialization network able to withstand a simultaneous failure of 10% of its parts? See, isolated failures always rely on the non-failing services -- and an excess of available resources -- to complete repairs. When a power plant fails, all the power experts get called on the phone lines, and they rush to the scene to fix this lone failing power plant. They use credit cards to buy plane tickets, gas, food, you name it. And when they're done, they go home and wait for the next power emergency. This demonstrates the 1% fault tolerance of our current system. But what if ten power plants go down? Suddenly you've got 1/10th of the available resources for each power plant. Then what if the telecomm is down? You can't reach the people qualified to repair the power. If the telecomm is down, they can't use their credit cards to get there. Then what if the airlines aren't flying? You've got delays, people have to drive. So they depend on oil, but what if the oil tanker shipments are delayed?
AT WHAT POINT IS THE FAILURE UNIVERSAL?
See, at some point, somewhere between 1% and 100%, you get a total failure of the network. The real Y2K question, when you boil it down, concerns this number. What percentage of simultaneous failure can the network withstand without collapsing?
Clearly, it's something lower than 80%, something higher than 1%. Perhaps the network could withstand a 5% failure; that's debatable. Imagine if 5% of all financial transactions were bad. That would clobber the financial institutions: busy signals forever. Imagine Wall Street with a 5% transaction failure. The whole system would shut down due to the 5% failures. A 10% failure would seemingly bring most networks down. Imagine if 10% of the parts in a power plant didn't work correctly. That's an off-line plant in short order. Imagine if 10% of the parts didn't show up at the Chrysler plant. That's a sure-thing shutdown. Imagine if 10% of the water treatment plants in the country failed. It would be a Red Cross nightmare, just attempting to supply water to 10% of the population.
In my opinion, the world probably can't withstand a 10% failure rate without severe and long-term consequences. A 20% failure rate would be, I think, a fatal economic event. It would thrust the world into a depression with all the resulting costs in dollars and lives. At a 20% failure rate, the efficiencies break down: the food production and deliveries, the oil, power, banking, telecommunications, and so on.
80% ISN'T GOOD ENOUGH
This is why, when people tell you that 80% of the systems are going to be ready, that's not nearly good enough. Technically, if you believe my analysis, 80% of the systems working is still a disaster. 20% of the systems failing could break the global network's back. In fact, a 95% "working" ratio isn't good enough, either. Even a 5% failure could have long-term, painful consequences. In order to avoid the worst effects of the Millennium Bug, systems need to operate at 99% or better. We need to have less than one failure per one hundred systems. At that rate, I'm confident the fault tolerance ability is sufficient. >> |