Problems come in all shapes and sizes. I've been involved in all kinds of investigations, from those dealing with something as mundane a chronic lack of hot water in a shower facility, to something as critical as a software error that caused non-conservative miscalculations of reactor operating limits. I've even been involved in a fairly significant event before, which my "friends" keep reminding me about even though such remembrances cause me great pain and embarrassment. Sometimes, though, an event comes along that really drives home the value of doing a thorough incident investigation and root cause analysis.
It's actually kind of surprising to see how a major event evolves, even when you're expecting to find problems. Sometimes, you find one big problem that clearly led to a significant consequence. Other times, there are many smaller problems of only moderate consequence individually, that combine to create a huge issue. I personally find that it is these types of events that maximize the learning potential for an organization, because they tend to shine a spot-light on cross-cutting issues that are contributing to problems everyday.
I was recently involved in an investigation on just such an issue. The actual consequence was pretty serious; the potential consequence was very serious, from a financial standpoint. In the end, there was no single magic bullet that caused the problem, but many smaller issues scattered throughout the organization like buckshot. I'm not going to provide any details about the event, but I did want to share the following, which I included in the final report as an "evaluator comment" -- this has been modified to remove any specific details.
It is very easy to look back on this event, with the laser-like precision of 20/15 hindsight, and pick out all the flaws that occurred over time. This is fine, because the goal of this investigation is learning and continuous improvement. However, it must be recognized that the people involved in this event did not intentionally set out to cause this problem, and did not have knowledge of the future. If their decisions or work practices were deficient in some way, it is because the system allowed these deficiencies to exist.
Every person interviewed for this investigation was very forthcoming with information. There is no instance where the lead investigator believed an interviewee was being dishonest or deceitful. In fact, the great majority of the interviewees displayed a desire to learn from this event, and were actively engaged in their interviews. However, there was a sense among most interviewees that this event, while unfortunate and painful, was a fluke. "We've done this job hundreds, if not thousands of times before, and never had a problem like this." Along with this idea, there was a sense of detachment from the ownership of the adverse consequence. "I did everything I could do. If only..."
The corrective action plan issued by this investigation report will only make a difference if we can get past the sentiments expressed above. There is a real danger of pigeon-holing this event as "a project management problem", "a contractor problem", or "a procedure adherence problem". In fact, it is all of these, but it is much more; it is a clear indicator that our current way of doing business - our systems, our processes, and our culture - is capable of producing extremely significant and painful events. It will go on doing so until we correct these fundamental problems.
I wanted to share this (via my blog) for a couple of reasons. First, I think it provides a pretty good summary of the kind of mind-set you need as a root cause investigator. In particular, you have got to realize that people make the decisions they do for what seem to be good reasons at the time... and they're making these decisions without the benefit of knowing what's about to happen. Second, it shows how people involved in an event can actively acknowledge that what happened was a problem, and even accept that they were a part of it, but still not grasp that there was anything they could have done differently. In fact, from their viewpoint, the actions they took might have been the right, accepted actions within the context of the organization that they experience everyday.
This leads me to the final point I wanted to make... that in performing any kind of problem investigation, you've got to question the system itself: its rules, its beliefs, its processes, and its norms of behaviour. Your goal is not to optimize the system! While it may sometimes be possible to make limited changes within the system, you will find cases where more drastic measures are needed. Your goal is to find the system's flaws and eradicate them, even if that means making significant changes. The risk of not doing so may be unacceptable.
by Bill Wilson