> Don't tell me, ... tell EBAY about fault tolerant Sun configs. >Seems they take a 3-5 million dollar ding every year because of a >crash so maybe it's about time Sun helped them out?
First, Sun builds only one true Fault Tolerant system, which is sold exclusively into the TELCO arena, .ie. lock step processing, it is not the E10k. The E10k is Fault Resilient. Configurations can support High Availability with failover support. The problems that Ebay have been experiencing is that of rapid change and growth. Policies and Procedures are mainly to blame. The following analogy applies not only to Ebay, but for all IT departments looking to have Policies and Procedures dictated to them by the vendor.
If you buy a car and crash it into a tree. Do you go back to the dealership and complain that they didn't properly prepare you to for driving. Of course not. However, hardware vendors do offer training classes to properly prepare the System Admins and managers to deal with the ever growing capacity of Open Unix Servers. What I've noticed though is that there is a reluctance of management to send their employees to training.
<rant on> The net is I'm sick and tired of hearing about how Sun, HP, IBM etc hardware is responsible for downtime. True hardware will fail. Proper configuration, training, change control (Policies and Procedures) will greatly reduce the business impact. The IT VP's, Directors, and line managers need to accept the responsibility of running their own IT departments, and quit blaming the vendor. <rant off>
mg |