eBay Improvements Letter pages.ebay.com
July 19, 1999
To our valued community members,
We want to thank you again for the patience and support you have shown us in the past month while we have been improving eBay's system. Now that we have learned more about the June 10-11 downtime, we want to describe many of the steps we have taken to address the issues, as well as to update you on the current status of the site.
The downtime on June 10 was caused by a commonplace change that we made in our database software in April, which then made the system vulnerable to a bug in the database server's operating system. In June, the bug was triggered and it corrupted our data and caused us to be unable to restart the server. We were able to restore all data and rebuild the system. Read here for more details.
Following the June 10-11 downtime, we immediately began a comprehensive site stability effort to address the issues associated with the downtime. We are working on four key areas:
1.Strengthening our engineering staff 2.Further enhancing our testing processes and decreasing the number of changes made to the system 3.Installing recovery mechanisms 4.Building future infrastructure for stability, performance and scalability
1. Engineering Staff
To meet the performance needs of our system, we grew our engineering staff by 75% in Q2. We will continue to rapidly grow our engineering staff this quarter.
We are working closely with our key vendors to improve site stability and performance. Sun, Oracle and Veritas dedicated their best people to work with eBay full time, and they identified a list of actions for enhancing site stability and performance. We have already taken action on the low-risk, high-benefit recommendations, and we are currently working on implementing many more.
Bob Quinn joined us as our Chief Information Officer, and he is focused on managing the site to ensure site stability and performance. Bob has been with Sun Microsystems for eleven years — most recently as Vice President and CIO of Computer Systems. Bob has had over twenty years of experience in managing mission-critical systems, from mainframes to client-server systems. We are confident that he has the management experience and knowledge to take eBay to the next level of stability and reliability.
In addition, Mark Ryan will join us as our Chief Technology Officer. He will ensure that our technology strategy is aligned with the tremendous growth we will continue to experience. He joins us from IBM's elite team for Complex Architectural Solutions and is an expert in data center design, Internet backoffice and crisis recovery situations. As a recognized expert in complex architectural solutions, Mark has helped some of the world's best companies stabilize their system and applications designs, and to build new scaling capabilities for the future.
2. Testing Processes and Change Management
We not only strengthened our engineering staff, but we are also reinforcing our testing processes. We are implementing more rigorous and disciplined stress testing. We are now making changes to eBay's site at a slower rate, focusing on changes that enhance stability, improve scalability, and improve the usability of the site. Such changes include upgrading the database software on July 16 to enhance stability, modifying functions (such as the seller search list and My eBay) to improve their performance, and improving the user interface.
3. Recovery Mechanisms
Currently, we have implemented the hardware and software necessary to create a "warm backup" of the eBay system, and it is currently being tested. The "warm backup" is a duplicate set of hardware and software that "mirrors" the main eBay database server and is updated constantly.
Should an extended downtime occur, we would bring the warm backup up-to-date (so that no items, bids, or other updates would be lost) and then bring the service up on the backup machine. Since the warm backup machine is an identical copy of the main server, there would be no differences between the two.
This process is not instantaneous, and it would be used primarily to avoid extended downtimes. In some circumstances (such as a single CPU failure), it may be faster to have the main system recover itself, rather than moving to the backup.
During one of our regularly scheduled maintenance downtimes in July, we plan to switch to the warm backup machine, just as we would during an extended downtime. This will verify that the backup machine and our procedures for switching to it are reliable.
Furthermore, over the next few months, we will be working on new technologies that will make the warm backup "hot" and will allow rapid transition if we have system downtime. eBay is working closely with its hardware and software vendors to ensure that we implement the most effective and reliable backup system. We are committed to communicating our progress to you as we have more to report.
4. Future Infrastructure
During Q2, in order to scale the system and make it more reliable, we invested over $10 million in hardware to support our customers' growing needs. This investment was more than three times that in Q1. We added eight 350Mhz processors to our database server. We also added another Sun E10000 Starfire to process search requests. On the day we installed this new capacity, we processed half a million more searches than we did the previous day.
Michael Wilson, Senior Vice President and Chief Scientist, is focused on creating the next generation system architecture, which will provide state-of-the-art site stability, performance and functionality. It will also allow the site to scale for future growth, without loss of performance or reliability.
We are dedicated to our goal of providing dependable service, 24 hours a day, 7 days a week. Your success is crucial to us, and we are committed to providing you with the highest level of service and reliability.
Regards,
Meg and Pierre |