by Bill Wilson
Root cause analysis can be characterized in many ways. Some refer to it as a tool for continuous improvement. Others call it a method for finding performance problems. Those at the receiving end, however, often view root cause analysis as just a repackaged version of "the blame game". Who can blame them when operator error or lack of attention to detail are so often listed as causes?
Most problems and accidents involve human activity at some point or other. Often, this activity is right at the point of occurrence, and people at the sharp end are usually operating under difficult or confusing circumstances. They make decisions and take actions that, in hindsight, prove to be "wrong" in some way. Then, after something "bad" happens, we perform an investigation and find these "human errors"... and all too often, we stop there without really considering all the factors that shaped the undesired outcome. This article discusses a new tool that has been developed (by me) to help investigators find these factors so they can be used as starting points for root cause analysis.
Problems come in all shapes and sizes. I've been involved in all kinds of investigations, from those dealing with something as mundane a chronic lack of hot water in a shower facility, to something as critical as a software error that caused non-conservative miscalculations of reactor operating limits. I've even been involved in a fairly significant event before, which my "friends" keep reminding me about even though such remembrances cause me great pain and embarrassment. Sometimes, though, an event comes along that really drives home the value of doing a thorough incident investigation and root cause analysis.
I confess to being a list-maker. I've published a few root cause analysis related lists in this weblog since it's inception 18 months ago in May 2004. I've also posted a couple in the
wiki (old wiki gone, new wiki still sparse). I thought it might be interesting to wrap them all up into a single article. So, with that introduction, here are the lists.
How many times have you read a news article about a major problem or incident, and seen Human Error in the headline? A quick search of Google News for that phrase provides plenty of examples. The latest big incident of note (when this article was written in 2005) is the August 12, 2005 Los Angeles blackout -- sample headline "Human Error Led to Widespread Outage." Well, what a blinding flash of the obvious -- people make mistakes!