More: AT&T crash exposed Achilles' heel USA Today - 04/15/98- Updated 08:54 AM ET NEW YORK - Sherry Nash, senior vice president for data networking at Wells Fargo Bank, was in a meeting when her pager sounded the alarm. It was the big one.
"MAJOR NETWORK OUTAGE," blared the message from network control. "AT&T FRAME RELAY FAILURE. 1,023 SITES ARE DOWN."
Nash has encountered nearly one network failure a week during the past five years, but nothing like this. The San Francisco bank's backup systems generally serve just 45 or 50 sites.
"We never conceived that we would lose 1,000 locations at a shot," she said.
Nash knew it would take a major effort to restore the service, which connects the bank's branches, administrative buildings and ATMs, allowing people to withdraw money and use credit cards. Roseville Telephone, MCI and equipment maker 3Com worked all Monday night to expand the backup network.
Similar scenarios were played out at businesses nationwide as AT&T rushed to fix a major data network that serves thousands of customers. CEO Michael Armstrong grappled with the first major crisis of his nearly 6-month tenure.
Known as a frame relay network, it transmits information much like the Internet or phone system and is primarily used for intracompany communication. The network crashed Monday afternoon when two Cisco System switches, which direct traffic on the data network, failed. The problem spread like a virus - or, perhaps, a nuclear reaction - to 143 other switches, disrupting business for airlines, credit card companies, retailers and others.
Though most of the problem had been fixed by late Tuesday, the tales that surfaced underscore just how dependent the country has become on such high-speed networks and the vastness of the increasingly complex and connected digital realm.
"We are so reliant on these networks, it's scary," says analyst Jeff Kagan, president of Kagan Telecom Associates in Atlanta. "The complexity and connectivity breeds vulnerability."
The evidence was everywhere.
MasterCard International says its system was disrupted by the AT&T outage but that backup networks kicked in immediately. "We were able to process all the transactions that hit our network, but it might have taken longer," says spokesman Edward Dixon.
One victim: Sprint spokeswoman Sydney Shaw. Her credit card was rejected late Monday as she tried to make a purchase over the telephone.
"They thought I had a bum credit card," Shaw says. Only later did the retailer learn that the problem was with Sprint's rival, AT&T.
British Airways spokesman John Lampl reported "massive delays" at ticket offices in Los Angeles, San Francisco, New York, Chicago, Pittsburgh, Atlanta, Philadelphia, Miami, Phoenix and Boston. Though toll-free reservation numbers were working, there were delays in the system that prints tickets.
Wal-Mart says more than half its 2,359 stores were affected. Electronic inventory and credit card verification systems depend on the networks. "We had a team in place in our information systems area working around the clock," spokesman Les Copeland says.
Toyota Motor Manufacturing North America was unable to communicate with the home office in Japan or between assembly plants in the USA. The company said it had no backup plan for such a system failure. Spokeswoman Barbara McDaniel said people had to resort to using phones because they couldn't send and receive e-mail.
"Maybe it was a good thing. . . . People are talking to each other," McDaniel says. "It gives all of us pause to realize how dependent we have become on technology."
Big players have backup
Other AT&T customers with more intensive communications needs had backup plans in place.
FedEx spokeswoman Sally Davenport says Federal Express was never affected by the AT&T outage. Backup systems kicked in when the AT&T problems occurred Monday afternoon. Merrill Lynch and Citibank said their systems worked fine, too.
But operations at the American Red Cross, which collects and distributes nearly half the nation's blood supply, were "substantially slower," says spokeswoman Josephine Martin. Although the outage did not cut off blood to those who needed it, "It makes us inclined to increase vigilance about the efficacy of our backup system." The Red Cross used a backup telecommunications system until service was restored midday Tuesday.
American Airlines says its credit-card verification system was affected from 2 p.m. till midnight Monday. American reservations centers continued to take bookings for airline tickets without verifying travelers' credit card information. American says it was able to process the backlog of credit card verifications Tuesday morning when the system came back up. "It wasn't anything customers noticed," says Tim Smith, spokesman for American. Still, American stopped taking bookings at midnight.
Frame relay is a type of data service that breaks information into little pieces known as packets. Each packet has an address, known as a header. That allows information from various data transmissions to share the same transmission lines. That's more efficient than a regular phone call, which occupies an entire circuit for the duration of a call, even when someone pauses between words or puts down the phone to answer the doorbell.
Emergency mode
The network failure thrust AT&T, which likes to say that it has the most reliable network in the world, into emergency mode.
Armstrong was in a meeting at headquarters in Basking Ridge, N.J., when the problem struck at 3 p.m. Monday. Quality control officer Frank Ianna, who is responsible for the network, just happened to be on the same floor.
"I was here until sometime this morning, more listening and learning and sharing," Armstrong told reporters in a conference call at noon Tuesday.
He wrote a letter of apology to the CEOs of thousands of AT&T customers - and had them hand-delivered. AT&T told customers they won't be billed for their frame relay service until AT&T figured out exactly what caused the failure. The $1-billion-a-year frame relay service, which is growing about 35% a year, is crucial to AT&T's future in a highly competitive data market. AT&T is already investing billions of dollars to upgrade its network.
That billing delay goes beyond the terms of the standard contract, which only requires AT&T to issue rebates for the period the network was out of commission, analysts say. It was unclear how much the crisis will cost AT&T or if it will take a charge against earnings.
The company all but ruled out sabotage, but still couldn't definitively identify the root cause of the problem late Tuesday.
Kudos for chief
TeleChoice analyst Christine Heckart says Armstrong handled the crisis well. "Customers expect network problems," she said. "What they are most critical of is how providers perform during the crisis."
AT&T shares rose 8/16 to $64 7/8 Tuesday. Cisco shares rose 1 3/16 to $67 11/16. Cisco, the primary supplier for AT&T's frame relay service, which goes by the name of Interspan, said it began working closely with AT&T as soon as the problem was discovered. It dispatched workers to assist AT&T clients.
"We view this interruption as unacceptable and apologize to our joint customers," Cisco Senior Vice President Don Listwin said in a prepared statement. "Our joint teams will provide root cause analysis, remedies and process improvements intended to assure that an outage of this nature does not reoccur."
Frame relay networks fail all the time, although this was believed to be the largest problem yet. That's not surprising, because AT&T controls about 40% of the frame relay market. WorldCom, which also uses Cisco switches, said its frame relay network suffered a regional breakdown about two years ago.
MCI, Sprint and WorldCom were reluctant to take swipes at their rival.
"This could happen to anybody," MCI spokesman Mark Pettit says. o~~~ O |