SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : The *NEW* Frank Coluccio Technology Forum

 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext  
To: stephen wall who wrote (6110)10/4/2002 4:53:01 PM
From: Frank A. Coluccio   of 46821
 
Lycos Article: How and Why the Internet Broke

[FAC: During the late eighties, early nineties, there were a number of PSTN meltdowns caused by buggy SS7 software, the glue that tells the telephone network how to set up and tear down switched connections. That was an intelligent network problem. Yesterday's UUnet fiasco also stemmed from problems with software that is supposed to direct "calls", only in this case we're talking about routed packets. This is a dumb network problem. The dumb network is supposed to route around problems. Only, this one couldn't, because, in part, the carrier at the root of the problem was too big, i.e., in their failed state they were too much a part of what would have been the path to recovery. Hmm ...]

news.lycos.com

by Michelle Delio

Friday, October 04, 2002 12:35 p.m. EDT


The Internet was very confused on Thursday.

But cyberspace hasn't gone senile. Those massive e-mail delays, slow Internet connections and downed e-businesses were all caused by a software upgrade that went horribly wrong at WorldCom's UUNet division, a large provider of network communications.

The problem affected roughly 20 percent of UUNet's U.S. customers -- which translates to millions of users across the United States and around the world -- for most of Thursday, according to WorldCom spokeswoman Jennifer Baker.

The problem began around 8 a.m. EDT. Baker said in a statement that the company had fully restored service by 5:15 p.m. Thursday evening. Preliminary investigation by UUNet indicates the problems were caused by "a route table issue."

Sounds simple, but imagine an airport that's having an air traffic controller issue, and you'll have an idea of what happened at UUNet.

Route tables direct data from one major network to another or from one area of a network to another area.

UUNet is a vast, high-speed network. About half of the world's Internet traffic -- including about 70 percent of all e-mails sent within the United States and half of all e-mails sent in the world -- passes through UUNet. The backbone of the Internet is built from these large networks.

The Internet was designed to be fault tolerant, to route information around downed or clogged networks. But when the router tables that direct the data aren't accurate, "bedlam reigns on the network," according to Mike Sweeney, owner of the network consulting firm Packetattack.com.

According to networking experts, a "soft error" -- like a badly configured routing table -- is far worse than physical damage to equipment. Things appear to be working fine, at least for a while.

Luckily, in many cases a soft error is relatively easy to fix, since normally only one or two routers are upgraded.

"But in the case of UUNet, they changed the software on a lot of routers all at once, so any fault tolerance they had fell by the wayside as each router broke due to the bad software load or incorrect configuration," Sweeny said.

As the affected routers dropped offline yesterday, UUNet's response time got slower and slower to the point of failure.

"Other UUNet routers might have tried to pick up the load, but they would have quickly been overwhelmed by the volume of data, and they too would have slowed down," Sweeney said.

"It would be like a 10-lane freeway being blocked in both directions and yet all the traffic still trying to get from here to there using the side streets. It works for a short period, and then you end up with gridlock and nobody getting through."

Network experts were troubled at UUNet's choice to deploy a wide-scale upgrade without testing and retesting the configurations first.

"You have to test, test, test before you change configs," Mark Denham, a Toronto networking consultant, said. "And you really don't want to upgrade an entire huge system like UUNet all at once, if you can avoid doing so. It's insanely difficult to track down an error that could be hiding anywhere on a gigantic system."

"And you should always have an escape route handy in case everything goes to hell," Sweeney added. "A spare device, saved configurations, anything to get the network back and working quickly if the upgrade goes badly."

Some UUNet users said that their problems on Thursday went far deeper than slow e-mail and sluggish Internet connections.

Any Internet-based business hosted by WorldCom's service was hit hard. Not only were users unable to access the Internet, but at times their customers would have been unable to purchase goods, book travel and rental care reservations, or carry out other normal business activities.

WorldCom, which filed for Chapter 11 bankruptcy protection after a major corporate accounting scandal, claims that 60 percent of Fortune 1000 businesses use its UUNet network services.
Report TOU ViolationShare This Post
 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext