victor, *****OT*****
FYI March 01, 1999, Issue: 754 Section: News & Analysis
Schwab Outage: IT Wake-Up Call Jeffrey Schwartz
A brownout at Charles Schwab & Co.'s online brokerage last week delivered potent lessons to managers of transaction-intensive Web sites.
A configuration error with a new mainframe, added to increase capacity, brought down Schwab's trading system for about one hour as the stock market opened on Wednesday.
In so doing, it dramatized the need for e-commerce sites of all sizes to rigorously test new systems and components, institute strong failover and backup procedures, and implement those procedures quickly when things go wrong.
Until recently, Schwab seemed relatively safe because of its reliance on that ultrareliable computing behemoth, the mainframe.
"There's almost no excuse for a mainframe environment to be offline," said Giga Information Group analyst Rob Enderle. "These are extremely robust environments. Failover is anything but new. Performance could have degraded but, if done right, they should have been able to fall back on the previous configuration pretty quickly."
Ironically, last week's outage occurred just moments after Schwab president and co-CEO David Pottruck extolled the virtues of the Schwab Web site in a keynote address at the IT for Wall Street '99 event, a financial trade show, while making clear that Schwab and others do not have the reliability issue mastered.
"We spend a lot of time thinking about reliability," Pottruck said. "We are all in a learning mode. As we learn, we are all going to get better."
The new IBM mainframe-the sixth supporting the online brokerage giant's business-was turned on Tuesday night, and a configuration error related to how the new mainframe connects to other systems and applications made the site inaccessible Wednesday morning, officials said, although they declined to be more specific about the error.
Schwab officials insist they tested the system offline but that real-world conditions are difficult to mimic.
Difficult To Prepare
"I think the market opening is the true test of how things are going because you have 1,000 people signing on in the same one-minute interval. You can simulate a lot of stuff, but we obviously didn't simulate it perfectly," said Jan Hier-King, Schwab's senior vice president for e-brokerage technology.
At least one observer agreed that the unpredictability of loads in businesses such as online stock brokerages can make simulations a crapshoot.
"In order to design computer systems that will be reliable, we have to be able to predict what we are asking the computers to do," said Amy Wohl, president of consultancy Wohl & Associates.
The one-hour outage was intermittent, Schwab officials said. However, in retrospect, Schwab's technicians should have switched processing over to a failover system to minimize its impact, Hier-King said. Therein lies a lesson for IT managers.
"We were confident that this was a very short-term problem," Hier-King said. "We were so confident it was so short term, we didn't revert to our backup systems."
While it's difficult to draw direct comparisons between Schwab and other e-commerce sites because of Schwab's enormous transaction volumes, another major e-commerce site recently averted disaster through redundancy measures.
When both its primary and mirrored Unix ordering systems failed last month at its primary location hosted by Exodus Communications, National Semiconductor Corp.'s site blacked out for about two minutes before a second mirror site hosted by Digital Island in Hawaii kicked in, according to Phil Gibson, the company's director of interactive marketing.
Gibson acknowledged that his upgrade strategy can be slower and more methodical than that of Schwab.
"We can track over time to see when our servers begin to get a fever, and then act," Gibson said. A mainframe upgrade requires careful planning, and in Schwab's case, in a more compressed timeframe, he said.
Schwab will try the upgrade again, perhaps as early as this week.
It was the second outage in as many months for Schwab. In January, Schwab experienced a brief outage as batch processes extended into the new trading day.
Last week's incident was the latest in a series of outages that have hit Ameritrade, Datek, E*Trade and Waterhouse Securities, among others. But Schwab is the largest online brokerage-with more than 2 million user accounts-and is known to have a robust and highly fault tolerant infrastructure.
While many online brokerages have much work cut out for them to provide adequate capacity, Schwab, E*Trade and others have been diligent about upgrading their systems to support their unprecedented growth.
Schwab's system allows 100,000 simultaneous account logons, up from 10,000 just a year ago. Once completed, the upgrade attempted last week will boost that to 250,000.
In January alone, Schwab received 3.7 million trade orders, up from 2.8 million in December, and it signed on 1 million new customers in the past year.
David Joachim contributed to this story. |