To: Michael Coley who wrote (48638 ) 2/23/1998 5:15:00 PM From: Bill Lin Read Replies (2) | Respond to of 58324
Michael, or anyone, Are there any definite causes for the Click of Death failure? Are there any Reliability Engineers on this thread? As i understand Reliability Engineering, there are units taken into Environmental chambers and 4 Corner stressed. Hi/Lo Heat and Humidity. Additionally then can alter Electrical signals while the unit is running its test routine. Other tests like shock and the IBM standard drop test (can the product survive a drop off a table top) are used by manufacturers in the industry. The units which fail are debugged and a "cause" of failure is listed. If manufacturing related, this is fed back, and a redesign or "fix" is put into place in manufacturing. The only thing I have read about possible cause of Click of Death is a weak "glue" spot which holds the wire which connects the head to the circuitry. Not having seen an open drive, I don't know what I'm talking about. This glue spot supposedly degrades and causes the wire to slip into the disk media, causing friction, and eventually head misalignment. How the media corrupts and causes other heads in new units to also malfunction is Unknown. But returning to REliability issues...The normal curve for failures looks like a "Bathtub", with DOA units being very high, then a lower rate of failure through the units "life", then an ascending failure rate as the drives read "end of life". So I stated before that the 1% failure rate in itself does not present a problem. The DOA syndrome can be easily rectified via replacement. But the COD syndrome seems to be occuring on older drives 1 to 2 years old. This is more serious, because it points to higher failure rates than a typical "bath tub" curve should have. In fact, if 50% of the 120k (estimated) CLick of deaths occurred from "infection", and we use a 60K number as original CODs, then we still are looking at 0.5% deaths within 2.5 years of product life. This is pretty high, since we can surmise that failure rates are either linear or exponential. If it is a design defect, an exponential rate of COD should occur in next 2.5 years. Say, 3x more. So of the original 12 million users, 1.5% of their drives will suffer problems. If the defect is manufacturing, then maybe a linear defect rate is expected, so the 0.5% should remain, and 60k more units should fail in the next 2.5 years. However IOM is playing a numbers game, as we are all aware. The rate of failure of older units should be compared to the number of older units sold. Stated another way, defect population should be related with population age. This creates the suspicion that DOA problems are very low, but COD problems are very high. Using a failure RATE concept, we might have DOA problems being about 0.1 to 0.5% of units shipped. I believe the lower number. Then COD numbers based upon unit aging, is 60K units from 4 million units (12 - 8mm shipped in 1997), or 1.5%. Thus if there is a design defect, my WAG (wild ass guess) is that 1.5% of the 8 mm units shipped in 1997 will develop defects in the next 2 - 2.5 years or 120K bad units. By 5 years, an additional 120K to 360K units will have died. Of the 12-18mm units to be shipped this year, 180k to 270k units will die of COD, and by 5 years, 540k to 1.08mm units will die. Add the numbers up. you get 2.5 yr mortality of 300k to 480K units from yr 2.5 to 5 yrs additional mortality of 480k to 1.44mm units Total death over 5 yrs 780k to 1.92mm units of 12mm shipped or 6.5% to 16% failure over 5 years. AGAIN THESE ARE HALF ASSED NUMBERS, SINCE I HAVE NO DATA and did the math bad. Oh yeah, i forgot to add in the 2x infection ratio that i used earlier. But you might discount that to 1.5x since news of COD will reduce infection rates. And yes there are lots of unlisted assumptions. Like using 120k instead of the "real" defect numbers. Like taking 1/2 of the defects as "infected" instead of as original Click of Deaths. Like assuming the 1/2 CODs are related with the older population of drives. I'm sorry about the confusing numbers. I know its hard to read and follow. Summary: Click of Death deaths may actually be 1.5% based upon unit population. If true, this suggests a 5 year mortality rate of 3% to 16% depending upon linear or exponential death rates. AND also, I am not a reliabilty engineer. BL PS my concerns are based upon my reading of Click of Death problem, and my current need to outfit a series of computers with 100mb or high er storage devices for "sneaker net" file transfers. I personally do not have IOM stock but am related to someone who does.