This is old laundry, but was slipped under our door this morning. It has been widely disseminated at the CBOT (our neighbor is a trader). We had them up for wine a few nights ago and he mentioned the agony at the CBOT during the MCI network outage. I think he said that CBOT had moved on to another comms supplier. I will slip a copy of your newsalert under his door today.
-- How Not to Handle a Network Outage August23, 1999 Web posted at: 2:18 p.m. EDT (1818 GMT) by Jason K. Krause (IDG) -- One of the nation's largest telecommunications companies and its network-gear supplier suffered an embarrassing slipup when a frame-relay network crashed due to a software bug. Sounds a lot like MCI WorldCom's crash earlier this month, but in fact, AT&T went through it a year and a half ago. The way the two companies handled their respective outages made all the difference.
For a backbone provider like MCI WorldCom, bandwidth is a commodity. To undersell the other bandwidth providers in the industry, companies like MCI have taken to farming out engineering and customer service operations. "The debate going on internally in the telecom industry is 'Do we have strong networking groups inside, or do we simply outsource to Cisco, Lucent or Nortel?"' says Ford Cavallari, telecom consultant with Renaissance Worldwide. "MCI has chosen to outsource, which is fine when things work, but when a catastrophe hits, you're in trouble."
By outsourcing, companies can cut costs, but the move undermines their ability to launch a coherent and swift engineering response to an outage. That may explain why it took MCI 10 days to remedy the situation. Its failure to address the problem quickly, and its ultimate placement of blame on partner Lucent, hardly inspires confidence that the company will be able to ensure that such a disaster doesn't happen again. The problems run deep. "MCI WorldCom is made up of so many large companies in their own right that they all have a bad habit of blaming a different unit," says Tim Chase, director of network operations for AlF,ha.net, a Milwaukee-based ISP. 'WVhen I have a problem and call, for example, the MFS unit, they tell me its UUNet's fault. When I call UUNet, they send me back to MFS. I'm not surprised that they were so disingenuous with this problem."
During the crisis, company executives were conspicuously unavailable. "The longer the problem went on, fewer VPs seemed to be around," says Chase. "Only the occasional technician would accidentally pick up the phone if you were lucky." A quick comparison with AT&rs handling of its outage last year underlines everything MCI did wrong when its own network crashed earlier this month. What Happened In April of 1998, AT&T and Cisco suffered one of the biggest crashes in Intemet history. While AT&T engineers were upgrading some software in a network switch, a computer bug brought down the entire AT&T frame-relay network, cutting service for millions of people. Starting Aug. 5, a glitch in some Lucent software intermittently interrupted service on MCI WorldCom's backbone. The telco anticipated an outage of 24 hours, but the problem wound up affecting some 3,000 business customers over 10 days.
How They Handled It While there was plenty of blame to go around - AT&T could have simply blamed Cisco for giving it faulty software, and AT&T could have been criticized for not stopping the crash quicker - no fingers were pointed. Instead, Frank lanna, president of network services for AT&T, gave both the press and customers updates on the crash every couple of hours, detailing AT&T and Cisco's joint effort to fix the problem.
MCI WorldCom issued an alert to its sales force, which was given the option to deliver a notice to customers by e-mail, hand delivery or telephone - or not at all. After a deafening silence from company executives on the 10-day network outage, MCI WorldCom CEO Bemie Ebbers finally took the podium to discuss the situation. How did he explain the failure, and reassure customers that the network would not suffer such a failure in the future? He didn't. Instead, he blamed Lucent.
The Result For AT&T customers, the network was out of commission for anywhere from six to 26 hours. AT&T decided to waive all charges for service until it completed an analysis of the root cause. That didn't happen until April 29, more than two weeks after the outage. The cause was identified as some faulty Cisco soft Hare, but rather than let Cisco take the fall, AT&T and Cisco engineers pledged to throw their full effort into safeguarding against such a crash in the future. By handling the situation aggressively - and publicly - AT&T actually enhanced its image as a robust networking company.
Customers affected by the MCI WorldCom failure have been offered two days of free service for each of the 10 days service was interrupted - not particularly generous, say some customers. A few, including the Chicago Board of Trade, have already threatened to take their business elsewhere. |