Software as Root Cause
Wired News has a nice article on what they consider to be the 10 worst software bugs of all time (so far). I knew about a few of these, notably the Therac-25 fatalities and the Ariane-5 self-destruction. However, at least three of them were new to me.
Here is a list of the bugs they describe:
- 1962: Mariner 1 space probe destruction after launch
- 1982: Soviet gas pipeline explosion
- 1985-1987: Therac-25 medical accelerator fatalities
- 1988: Buffer overflow in Berkeley Unix fingerd / Morris Worm
- 1988-1996: Kerberos random number generator seed
- 1990: AT&T cascading crashes in long-distance switches
- 1993: Intel Pentium processor FDIV bug
- 1995/1996: Ping of death
- 1996: Ariane 5 Flight 501 disintegration after launch
- 2000: Panama / Multidata cancer treatment radiation dose calc errors
After looking at these bugs for a while, I see two broad groupings that I find really interesting. The first is the individual bug that leads to directly to an undesirable consequence. The second is the individual bug that leads to an undesirable consequence through secondary effects, or through a cascade of failures.
As software control becomes more prevalent in systems, safety-critical or not, the possibility of serious occurrences due to flawed software increases. Software flaws run the gamut from outright mistakes to subtle system engineering failures. As more and more automated systems are linked together, failure cascades become increasingly important.
In software engineering, as in many other fields, a view of Root Cause as "the original cause for a non-conformance" is probably becoming increasingly inadequate. A systems view of root cause is needed, and it needs to focus on system design and system interfaces.
I probably make this same point over and over in this blog, but here it is again anyway: the seriousness of any given problem is determined more by the state of the system than on the nature of the initiating event. When doing a Root Cause Analysis, focus on the system -- that is where the real problems are, and where the major benefits can be realized.
by Bill Wilson