MCI shut down
the article does not mention ESPI but it is of concern to any telecom.
MCI WorldCom network woes cast pall Complex high-speed networks more prone to trouble
By Jeffry Bartash, CBS MarketWatch Last Update: 4:02 PM ET Aug 18, 1999 NewsWatch
WASHINGTON (CBS.MW) -- MCI WorldCom's network breakdown over the past two weeks was by far the industry's largest yet, but it certainly won't be the last.
In the past few years, Bell Atlantic (BEL: news, msgs) , AT&T (T: news, msgs) and now MCI WorldCom have all encountered major disruptions in the high-speed data networks. While such troubles have generally been infrequent, they've caused much hardship for corporate clients, who've increasingly come to rely on them.
Though network failures often involve complex software and hardware issues, the root causes are actually quite simple. Carriers are trying to make large upgrades to intricate data networks to meet skyrocketing growth, especially for Internet access. At the same time, they are trying to keep networks up and running nonstop to meet the omnipresent needs of corporations.
"It's like changing jet engines in mid-flight," said John Ryan, chief analyst and founder of RHK Inc., a telecommunications consulting and market research firm.
Complex issues
The difficult is compounded by the complex nature of the systems involved. Many high-speed networks are based on "packet-switching," a form of data transfer in which a message is broken into small parts and sent via the best available route to the recipient, where the message is reassembled.
Packet-switching networks are like the hare in the famous fable. They allow more data to be sent at greater speeds than older, traditional circuit-switched networks. Yet while circuit-switched systems, like the tortoise, are slow, they are also steady. Network failure is almost non-existent on the older networks.
As high-speed networks expand, moreover, their increasingly complex nature requires increasingly complex software to run them. "Clearly the networks are becoming much more automated and dependent on software," Ryan said. "It?s very difficult to produce software that works for the very first time."
In MCI WorldCom's case, the carrier was upgrading its so-called frame-relay network with Lucent Technologies software so it could handle more traffic in the future. Though it was removed to end the crisis, new software will eventually have to be installed to allow for expansion.
That's because demand for frame relay is growing 60 percent a year, figures Lisa Pierce, director of global telecommunication services at Giga Information Group.
Trail of trouble
MCI WorldCom (WCOM: news, msgs) said its network woes began Aug. 5. But there had been several significant disruptions in the weeks leading up to that date. "Customers were reporting problems before that," Pierce said.
On July 22, MCI WorldCom's Boston switch experienced trouble and disrupted service throughout the morning, according to a letter written by Kerry Casey, executive director of service support at MCI WorldCom, a copy of which was obtained by CBS.MarketWatch. The Philadelphia switch then had a hardware problem, causing delays on Aug. 2. Casey also mentioned that MCI WorldCom was working to fix a "software error that had intermittently affected our customers."
"It is important to note that although the software error retains the potential to affect service until this (upgrade) work is completed, the risk of service impact is proportionately reduced as the upgrade moves to completion," Casey wrote in the Aug. 4 letter.
Linda Laughlin, a spokeswoman at MCI WorldCom, said she is not aware that those problems had anything to do with the network breakdown that occurred Aug. 5.
After the initial congestion, it took 10 days for MCI WorldCom to shut down the network to remove the problematic Lucent (LU: news, msgs) software. The company has come under intense fire for how it handled the situation. Many customers were without service for large portions of that time and they had difficulty extracting information from the carrier.
By contrast, when AT&T's frame-relay went down in April 1998, the carrier immediately went public with the news and focused entirely on getting most customers reconnected within a day or two. It then immediately offered a full month of free service, far more than the 20 days offered by MCI WorldCom to clients who had been disconnected for much of 10 days.
"We went very public with the situation because of the thousands of customers it was affecting," said Darrell Sagehorn, director of data marketing at AT&T. "Most customers were up within 24 hours."
Said Pierce of Giga: "You?ve noticed how both companies treated it differently."
The aftermath
Some industry observers wonder if MCI WorldCom was trying to do online diagnosis of the software -- a charge the company denies. Instead, these critics argue, MCI WorldCom simply should have removed the software and focused on getting customers back up.
MCI, which says its problems were different from those experienced by AT&T, has defended its actions. Spokeswoman Laughlin said technicians couldn't identify the problem right away. In addition, the carrier appeared to be making progress stabilizing the network in the first week, before another blowup occurred.
It was only then, eight days after the problem started, that the carrier decided to pull the plug on the entire frame-relay network, one of four it operates. "That's a critical decision -- anytime you affect all customers," Laughlin said.
The aftershocks of MCI WorldCom's network breakdown are likely to be felt for a while. Already, rivals are aggressively targeting MCI WorldCom customers, and fielding inquiries from irate WorldCom clients.
"We?ve gotten more calls in the past few days in light of the troubles," spokesman Tyler Gronbach of Qwest Communications acknowledged.
Still, the one thing you won't hear Qwest (QWST: news, msgs) or any other data carrier tell potential clients is that networks are foolproof. "Qwest is not saying it?s not going to happen to us," Ryan said, "because they think it could happen to them."
Gronbach concedes the point, but notes that Qwest is newer than its bigger rivals, uses the latest technology and runs a more uniform network.
"We?re all vulnerable, but companies using new technologies and fewer platforms ... could be less susceptible. When you do it (upgrades) in a larger environment, there are more variables," he said.
Yet the simple fact is, frame-relay and related packet-switching networks are not 100 percent reliable. "They aren?t as good as circuit-oriented networks. There is a tradeoff," Pierce said. "It?s called a 'virtual' network because it?s not real."
Backup plans
As a result, carriers are likely to take further steps to avoid costly blowouts in the future. AT&T created a 13-point blueprint for dealing with such situations after the April 1998 failure. Qwest runs a site in Arlington, Va. that extensively monitors the health of its network. MCI WorldCom is almost certainly going to strengthen its procedures for handling network failure.
"We all are striving for the reliability of a circuit-switched network," Sagehorn of AT&T said. "Across the industry this is another wakeup call that we have to be better attuned to outages. We did a year ago. Our competitors are learning this week."
Countered Pierce: "When it happened to AT&T a year ago, it should have been a wakeup call for the entire industry then."
Despite precautions being taken by carriers, analysts expect occasional network failures to occur in the future, given the nature of the technology and the intensifying demand. "We?ve seen explosive growth in the frame relay sector," Gronbach of Qwest said.
With that in mind, most businesses ought to protect themselves by using two carriers, at least for critical operations, or in rarer cases set up expensive but more reliable private networks. While the cost may be high, the cost of not doing so may be higher, analysts warn.
"You better have an excellent contingency plan," Pierce summed up.
Jeffry Bartash is an online reporter for CBS MarketWatch.
this can't be doing us any good
pete |