SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD) -- Ignore unavailable to you. Want to Upgrade?


To: fyodor_ who wrote (72047)2/20/2002 5:17:23 PM
From: Tony ViolaRespond to of 275872
 
Fyo, >It's not just a question of ensuring that data isn't lost and that downtime is minimal. The problem is also that every time something fails, there's a great deal of work (read: money) involved in getting the affected node fixed.

True, but how often does that happen? The scale out vs. scale up argument is pretty hot right now.

Tony



To: fyodor_ who wrote (72047)2/20/2002 9:10:07 PM
From: Dan3Respond to of 275872
 
Re: It's not just a question of ensuring that data isn't lost and that downtime is minimal. The problem is also that every time something fails, there's a great deal of work (read: money) involved in getting the affected node fixed.

Ummmm. It's a lot cheaper to replace a COTS box or rack unit than it is to replace a chunk of a mainframe.

A lot cheaper.



To: fyodor_ who wrote (72047)2/20/2002 9:25:14 PM
From: pgerassiRespond to of 275872
 
Dear Fyo:

SMP systems go down far more often than you think. I have seen Tandem mainframes go down because the network glue was bad and it takes 4 hours to pull the offending backplane out and put in a new one. SMP systems are more costly to fix and have a higher likelyhood to fail. In addition, a cluster can simply shutdown the failed node and continue on at a slightly less powerful cluster. Matter of fact, that is how maintenance is done on clusters, you manually down the target, upgrade, fix or do whatever you need to and then fire it back up and place it back into the cluster.

I wrote some software that determined that a node went down, automatically removed it from service and notified all appropriate persons on the proper list via email, prerecorded calls, etc. During testing, downtime was nil and the system could be built up or reduced at will (testing included simply powering off a node, removal of a board in a node, etc.).

Pete