How many times have you read a news article about a major problem or incident, and seen Human Error in the headline? A quick search of Google News for that phrase provides plenty of examples. The latest big incident of note (when this article was written in 2005) is the August 12, 2005 Los Angeles blackout -- sample headline "Human Error Led to Widespread Outage." Well, what a blinding flash of the obvious -- people make mistakes!
Human error is almost always the direct cause of any consequential event. It is also something that you will never be able to completely eliminate. The worst thing, however, is that pinning a problem or event solely on human error prevents you from finding and addressing a whole host of other factors that are probably more important. So what else is there? I would want to know the answers to the following questions:
- Was the possibility of the error known? *
- Were the potential consequences of the error known? *
- What about the activity made it prone to the occurrence of the error?
- What about the situation contributed to the creation of the error?
- Was there an opportunity to prevent the error prior to it's occurrence? *
- Once the error was committed, was there any way to recover from it? *
- What about the system sustained the error instead of terminating it?
- What fed the error, and drove it to become a bigger problem?
- What made the consequences as bad as they were?
- What (if anything) kept the consequences from being worse?
Note that some of these (marked with asterisks) can seemingly be answered with a simple yes or no... in fact, these questions are merely introductions to the following lines of inquiry:
- If YES, why did the event proceed beyond this point?
- If NO, why not?
Usually when I answer these questions, I find that the immediate error that "caused" an event was really just a very small piece of the problem. As I've said many times before in this blog, the consequences of any given event are usually more dependent on the state of the system than on the specifics of the initiating event. Combining this concept with the realization that human error can never be completely eliminated, and it becomes very clear that we should be spending our time investigating and fixing the system; blaming a problem on human error is almost always a cop-out.
Recommended additional reading - Human Error: Models and Management, by James Reason
Acknowledgement - Questions 9 and 10 in the list above are attributed to Dr. Bill Corcoran of Nuclear Safety Review Concepts. Please visit his Yahoo Group, Root Cause: State of the Practice, for a wealth of additional root cause analysis resources.
by Bill Wilson