GOOD ARTICLE! : - EMBEDDED SYSTEMS ARE A BIG PROBLEM
From: "Harlan Smith" <hwsmith.nowhere@cris.com> 16:04
Subject: - EMBEDDED SYSTEMS ARE A BIG PROBLEM
-----Original Message----- From: Harlan Smith [SMTP:hwsmith@cris.com] Sent: Wednesday, October 21, 1998 9:53 AM To: *L Year2000 UnmodEFN (E-mail); WG - CPSR (E-mail 2) Cc: 'Bill Dale'; * Mitch Ratcliffe (E-mail); * Mark Frautschi (E-mail) Subject: EMBEDDED SYSTEMS ARE A BIG PROBLEM - RE: EMBEDDED CHIPS NOT SEEN AS BIG PROBLEM
Bill,
Thank you for asking me to critique Mitch Ratcliffe's article chkpt.zdnet.com. html.
Overall, I give him a grade of about C+. It makes some good points, but he sure doesn't get an A for a lot of reasons.
The premise of the article seems to be that most of what has been written about "embedded systems" is garbage and he is entirely correct. He gets some very good shots in but perpetuates some of the myths.
After going through the article with a fine tooth comb (see comments below), I conclude that it has many flaws but overall it tells the story that needs to be told. Most of what has been written in the popular press about embedded systems is TOTAL BALONEY.
Mitch strongly conveys the idea that he still doesn't seem to "get it" that the "embedded systems" problem must be addressed and discussed at the system level and not the chip level. To talk about the problem at the chip level is counterproductive and sends people off wasting huge amounts of money looking for "chips in the haystack". Maybe David Hall started this with his 40 Billion chip baloney.
From a writer's point of view, Mitch has tried to correct some of the gross misconceptions and exaggerations that have been published by non-technical people on the subject of "embedded systems". A severe handicap to anyone trying to write such an article is that most of what is published, and available as source material, is garbage and has no factual technical basis.
By non-technical people I certainly include Gartner Group -- they are IT people and have no knowledge of "embedded systems" and don't even know how to talk about the subject. Mitch has tried to correct some of the garbage, but it was a weak attempt and he never drills down to the basic facts.
1. "Embedded systems" are a very big problem but the number of systems affected has always been grossly overstated and the difficulty of remediating any single "embedded system" is grossly understated. Only idiots have been talking about "billions of systems affected".
2. "Embedded chips" are not a problem at all. There is no "embedded chip" problem and never has been. The problem is not at all a matter of "finding bad chips and replacing them". That is _not what the "embedded systems" problem is all about.
The problem is that all these non-technical people talking about the problem don't know the orifice in their posterior from the terra firma. They try to cite statistics that are completely meaningless.
Look at this garbage from Gartner:
"Warnings about embedded chips may have been overdone. "Embedded systems will have limited effect on year 2000 problems, and we will see a minimal number of failures from these devices," Gartner said. Only 1 in 100,000 free-standing microcontroller chips are likely to fail. Those that do fail will fail at the millennium, and the majority of these will only fail once, the report says."
This statement is ridiculous. "free-standing microcontroller chips" is undefined and not a term used by the technical community. "majority of these will only fail once" is a meaningless statement. "Embedded systems will have limited effect on year 2000 problems" - I don't think so. I expect dramatic
impact in foreign and other unremediated systems.
a. You don't count chips.
b. You count "embedded systems"
c. It is usually those more complex "embedded systems", implemented with a microprocessor plus several other chips, rather than the much simpler systems implemented with a microcontroller that pose a likely threat to important systems. So, to estimate the magnitude of the threat, the starting number is not the 40B microcontrollers extant but more nearly 3B microprocessors extant.
d. Then you sort these embedded systems into categories of complexity and application. It is usually only the more complex "embedded systems" at the level of complexity of a PLC and above that represent any significant threat and of course there are a lot of applications that can fail and the only result will be annoyance, not disruption.
e. In each category, you diminish the number of concern by identifying those few that have calendar or other critical date functions.
f. You again diminish the number of concern to those with Y2K flawed software.
It's only idiots that have been talking about "billions of chips", so Gartner finally got some real information, but presented it very poorly and again perpetuated the bad chip myth. Gartner doesn't have a clue about the embedded system problem and should not be talking about it. As most of the non-technical people talking about the subject, they are stuck on the idea of looking for a few bad chips in millions and millions. That isn't what is done. You identify systems and check with the vendor and/or test/analyze them at a system level, irrespective of how many chips appear in each system.
The procedure is very analogous to testing your PC. If your PC malfunctions you don't go in and start counting the chips. You analyze the performance of the system and maybe you discover flawed or corrupted software or bad memory, or a bad modem or sound card. You work the problem down from the top of the system.
There is no "embedded chip" problem and the problems are not fixed by finding and replacing chips. This is a totally erroneous idea. The problem is that an "embedded system" may have a calendar and this calendar function may have been incorrectly implemented in software such that it has a Y2K problem. A test of the multi-chip system must be made and, if it fails the test, the software or the entire system must be replaced. Replacing software is not easy as the source code must be found, the code remediated, the system re-tested, a new PROM programmed with the revised code and a repair action accomplished to replace the PROM. There are potential roadblocks at every step of this process.
I have explained a lot of this in my article: 2000.jbaworld.com
Most of what is written about embedded systems is JUST FLAT WRONG! It is written by non-technical people who just don't understand, and apparently don't even try to understand the simplest of facts about "embedded systems". This is another example of the phenomena.
Now, with that background, let's examine in detail what Mitch has said.
[snip] The embedded systems problem, while serious, has been dramatically overblown. This finding, confirmed by a wide range of real-world Y2K remediation efforts, does not lessen the importance of checking embedded systems, [end snip]
I disagree that it has been overblown. The number of affected systems of importance has been grossly exaggerated. The difficulty of remediating any individual system has been understated. Overall, perhaps not overblown at all but the picture of what the embedded systems problem is all about has been very distorted.
[snip] Embedded systems are microchips that control functions in a digital device. [end snip]
No, this definition is incorrect. Systems of consequence are usually composed of many "integrated circuits" (microchips). Single chip "embedded systems" rarely implement functions whose failure would cause disruptive effects. Single chip systems implemented with microcontrollers, rather than a microprocessor plus its supporting chips, will usually produce only an annoyance when they fail. See my article 2000.jbaworld.com which says:
[snip] "What is an embedded system? The IEE defines them as "… devices used to control, monitor or assist the operation of equipment, machinery or plant. 'Embedded' reflects the fact that they are an integral part of the system." It further says, "All embedded systems are or include computers. Some of these computers are however very simple systems as compared with a PC." Many "embedded systems" are computers that don't look like computers. [end snip]
Mitch repeats the Gartner statement: "Now, the facts are starting to pile up and it turns out that as few as one in 100,000 embedded chips actually needs to be replaced, according to a recent report by the Gartner Group's Y2K group."
Not a good statement because there is no base number to start with. I would like to see how Gartner developed that number.
Let's assume a base number of 40B. If we start with that unstated assumption then the end number would be (40*10^9)/(10^5) = (40*10^4) or 400,000. I think that is low. I don't have facts to refute it, nor does Gartner have facts to support it. It could be right. The numbers are only vaguely known. Rick Cowles has been indicating 25 million and David Hall 40 billion (which has always been ridiculous). But a strong consensus is that nobody really knows what the number is. It will appear in the history of Y2K.
[snip] Of course, you still need to find those chips. So, do test your systems (imagine trying to find that one-in-100,000 chip during a system failure after January 1, 2000, when your entire staff is idled by a failure). But, the scope of the problem, once estimated to require replacement of 12 billion or more chips, now appears to be limited to perhaps 850,000 date-dependent embedded systems. [end snip]
Well, the 850,000 number seems to be within an order of magnitude of reality.
But let's do the arithmetic. (8.5*10^5) * (10^5) = (8.5*10^10) or 85 billion.
Well, I guess that's all right if you start from the base that perhaps 70 billion chips of all types have been produced in history. But we have changed the base from 40 B microprocessors to 70 B chips of all types. That's OK, but numbers should not be presented without a description of what they are supposed to relate to. Very sloppy work. We've apparently switched the base from 40 B microcontrollers and microprocessors that could conceivab ly provide the core functionality of an embedded system to chips of all types. The Gartner footwork is just a little too fancy.
[snip] Moreover, it's not clear that all those date-dependent chips need to be replaced. Many systems will fail in non-critical ways. This only increases the importance of assessment programs, since organizations and individuals that could be impacted by a chip failure need to make an informed choice about whether they should replace it. [end snip]
Here we have very sloppy reporting. The 850,000 number is never properly characterized. It could represent the total of "embedded systems" that will exhibit date failure. If that is the case, then clearly many of these systems will not demand a repair as many of the failures will cause only annoyance. OTOH, 850,000 might represent those systems whose failure will cause disruption, in which case they should be replaced. We don't know unless we see how the number was derived. No reference is cited.
[snip] Computerworld reports that two major power and gas utilities in the Pacific Northwest are finding very few embedded systems problems. At Spokane, Wash.-based Washington Water Power, a test of 540,000 embedded chips found only 1,800 with date-related problems. Of those chips, only 234 needed to be replaced, or one-in-2,307. This is substantially higher than Gartner's projections would have predicted, but geometrically lower than early estimates that 20 percent of embedded systems would need to be replaced. [end snip]
There is a gross problem with this garbage statement. 540,000 embedded chips is not defined. Are they talking about some smaller number of "embedded systems" that in total are implemented with 540,000 chips. If so, and the average number of chips per system is 10, then we can produce the following derivative numbers:
Number of systems = 540,000/10 = 54,000 Percentage with date-related problems 1,800/54,000 = 3.3% Number to be replaced 234/54,000 = 0.4%
But they don't say what their figures represent and there is no standard way of collecting and presenting such data. Therefore we don't have any way of knowing its meaning.
With the assumption of an average of 10 chips/system, the problem is not insignificant and probably not at odds with findings at nuclear reactor facilities and Kraft Foods. What has to be understood is that it usually the more complex SCADA systems that are vulnerable to date-related failure and when these systems fail they will have major system impact. OTOH, the presence of these complex systems is obvious and we don't have to search for them like a needle in a haystack.
What we might conclude from such poorly presented "information" is likely that we really don't have that "needle in a haystack" problem, we need to test for the important elements of functionality and test that functionality thoroughly. People looking for the single chip system in the haystack may be wasting their time. The search through the system must be done very intelligently, using a "failure modes and effects" discipline.
[snip] A Reuters report on the shipping industry's preparations underscores the need for testing, even if the assessment reveals few affected embedded systems. Large commercial ships are sparsely manned, and a failure can produce catastrophic problems because the ships are so dependent on computers. However, the industry has made good progress, largely because the systems that control ships are very centralized. There are approximately 100 embedded systems on a typical ship. Royal Dutch Shell's Trading and Shipping Company reports it has tested about 60 percent of its fleet and is on track to complete its remediation project this year. [end snip]
Now this paragraph is intelligent. The ** systems ** must be identified and tested. Describing the problem as looking for microcontrollers in the haystack is grossly distorting the situation -- Royal Ductch hasn't done that. So the news is encouraging from Royal Dutch, but what about all the other shipping lines? I would guess that Royal Dutch is at the front of the pack.
[snip] TechWeb's story, "Big Companies On Track For Y2K Compliance" offers an odd juxtaposition of fiction and fact. It begins with the statement, contained in a standard "Y2K nut graph," the explanation of Y2K tacked into every story you'll ever read on the problem, that "Software code and billions of embedded chips must be manually changed in everything from clocks to nuclear warheads." The story then explains that the embedded systems problem has been grossly exaggerated, according to Gartner Group. Apparently, neither the writer nor the editor thought to explore or correct this contradiction. [end snip]
Good shot!
[snip] One story out recently does offer "proof" that Y2K noncompliant embedded systems are very dangerous to our health. The Associated Press reports that a Y2K "expert," Frederick Kohun, of Robert Morris College in Pittsburgh, has issued a warning that "about half-a-million hospitals nationwide use a $19 [IV} pump that.... will shut off in 2000 because a computerized alarm will think they haven't been recalibrated since 1900." Sounds like a Big Problem, no? Well, no. First, there's the matter of the number of hospitals: There are not half a million hospitals in the whole world. The U.S. has approximately 7,500 hospitals. So, the finding is questionable on its face. If, in fact, we're talking about the number of this kind of IV pump in use, that may be a problem, but presumably these systems will fail only once, on January 1, 2000. Simply by recalibrating these instruments, you get another 100 years of use out of them. Of course, someone out there is saying "But, how can you recalibrate a half-million IV pumps?" Well, since they have had to be recalibrated every six months, we assume this is something hospitals deal with as a normal part of their operation. [end snip]
GUESSING ALERT!!!
No, no, no Mitch. You are "presuming" and I'm quite sure you are presuming wrong. I think that once the 2000 threshold has been crossed, these machines will forever think that they are out of calibration. Somehow the matter must be thoroughly addressed. You should not be even trying to make such guesses without knowing the details of the implementation.
[snip] Hospitals are also finding that, when a problem is identified, it is amenable to remediation. Sometimes, it even works out that the repair is cheaper than initially reported. For example, the San Antonio Express-News reports that when Guadalupe Valley Hospital found its anesthesia machines would have to be replaced due to an embedded systems problem, it was told by the vendor that the total bill would be $160,000. The hospital's administrator said, in effect, that the vendor could go jump in a lake -- he would buy from another company. Two days later, the vendor called back and offered to fix the existing systems for $4,000. Still, the hospital is taking no chances, it will put person next to each occupied bed on December 31, 1999 to provide backup to machines. [end snip]
Good information. It just emphasizes the point that all of these problems must be worked intelligently.
[snip] Finally, from Australia comes a report from the Australian Financial Review that is filled with outdated speculation about embedded systems and some hard facts about the actual findings at Australian hospitals. Assessment has identified systems that are date-affected, but the story doesn't say what percentage of those are actually going to fail. Also, there is no evidence that the Australian hospitals have played hardball with vendors. The article veers from fact when a Y2K consultant, who obviously has an interest in raising fears, claims that half the medical equipment in Australia could fail. It's a whol[l]y unfounded statement, so be sure to watch for this article on the doomsayer newsgroups. [end snip]
"surgical implants may contain embedded controllers that cannot easily be updated" Right Mitch, this looks like baloney. Implantable devices per se have been widely reported as not affected by Y2K flaws.
"Engineering and IT services firm Infrastructure Control Services was contracted in March to conduct embedded systems audits at four NSW hospitals. The managing director of ICS, Mr Tim Smith, predicted that the Y2K bug would lead to the failure of half of Australia's medical equipment – either completely or through inaccurate diagnosis. " WOW, scare story like David Hall's 40 Billion chip baloney. Very self serving.
You're right Mitch. This was another article that received no critical review -- like yours.
[snip] Don't assume everything's going to be just fine if you don't take action to assess your embedded systems. However, it's clear that this is a manageable problem for organizations that take action. [end]
A little too trite. ITS A DAMN BIG PROBLEM. Aggressive action must be taken. Management must get involved. Resources must be devoted to addressing it. Most foreign countries will likely fall flat on their face in the embedded systems area with dramatic disruptions ensuing.
Later, Harlan
>-----Original Message----- >From: Bill Dale [SMTP:billdale@lakesnet.net] >Sent: Tuesday, October 20, 1998 10:32 PM >To: Harlan Smith >Subject: EMBEDDED CHIPS NOT SEEN AS BIG PROBLEM > >Hi Harlan, > >This is an article by Mitch Ratcliffe. Thought you might want to give it a >read. I'd be interested to know what you think.. > >Bill > >EMBEDDED CHIPS NOT SEEN AS BIG PROBLEM > The embedded systems problem, while serious, has been > dramatically overblown. This finding does not lessen > the importance of checking embedded systems, but it should > offer some relief from the hysterical reports that virtually > any microprocessor-controlled system will fail come the millennium. >http://chkpt.zdnet.com/chkpt/y2ke981020002/www.zdnet.com/zdy2k/1998/10/4925 .html
Harlan Smith Synergistic Mitigation & Contingency Preparation -- “Austere Infrastructure” 2000.jbaworld.com scotsystems.com (for printout) Quick Small Business Guide to Y2K angelfire.com Embedded Systems Remediation 2000.jbaworld.com y2knews.com YOU CAN HELP angelfire.com
-- |