Root Cause Analysis (RCA) is generally conducted in several phases. I've seen some methodologies that break down the RCA process into as many as a dozen different steps. In reality, however, there are just three main phases we need to be concerned about. More importantly, these three phases are very different from each other... so different that they should always be kept distinctly separate. I've designated these phases Investigation, Analysis, and Decision. Read on to see why.
Phase 1: Investigation
The purpose of the investigation phase is to discover facts that show HOW an incident occurred. During investigation, we are not concerned with what didn't happen, or what should have happened -- the only concern is what actually happened, without any judgement of value. Investigation deals with facts in a value-neutral manner.
During the investigation phase, if you find yourself using words like "not", "should", "error", "incorrect", "inappropriate", etc., STOP! You are injecting value judgements into a practice that requires absolute neutrality. Facts exist regardless of what we think or feel about them. Jumping too early into what should have happened will obscure your vision of what did happen.
There may be times when required facts simply aren't available -- critical evidence was destroyed in the process, or there were no witnesses to a critical event. In such cases, you have some options. Consider secondary sources that may not be conclusive, but could provide enough circumstantial evidence to guide further investigation. Attempt to reconstruct the event using plausible scenarios and then perform controlled tests to confirm or deny the most likely explanations.
Regardless of the tools you use, the final product of the investigation phase should be a factual representation of the incident. If some facts were not available, and theory (backed up by testing) had to be used instead, ensure this is clearly evident in the representation of the incident. This representation should then be thought of as a complete script or plan for reproducing the incident in detail. Only after you've reached this point should you progress to the next phase, Analysis.
Phase 2: Analysis
The purpose of the analysis phase is to discover reasons that explain WHY an incident occurred. This is when you take the purely factual representation of the incident and view it within the context of the system (or organization) that created it. The values of the system (purpose, rules, culture, etc.) can now be used to compare what actually happened against what should have happened, at any point during the incident.
During the analysis phase, do not let yourself fall into the trap of believing that the values of the system are always correct! You are not just analyzing the incident itself, but also the system that created it. Mentally place yourself within the incident, watch events unfold, and then determine if the system's values were, for example: correct but inadequately applied, insufficient to prevent the incident, or incorrect such that the system's values actually created (or contributed to) the incident.
Don't get too caught up in the mechanics of the analysis tool being used. Many tools are available to aid the analysis phase. Each has its own strengths and weaknesses, and preferred realms of application. For example, if you're not getting any insight using Barrier Analysis, switch over to Change Analysis. The point of any analysis tool is to provide insight, and in some situations, one tool may be vastly superior to another.
Finally, do not let questions like "how can I fix this? ..." be considered during the analysis phase. It is all too easy to let desired corrective actions colour your perceptions of an incident's causes. However, analysis is about discovering conditions that exist now or existed in the past. The future must not enter into the equation. Jumping too early into what could be risks obscuring your vision of what is.
Regardless of the tools you use, the final product of the analysis phase should be a finite set of root causes for the problem/event that show why it was inevitable. Yes, inevitable -- these are fundamental, latent conditions that were just laying around waiting for some kind of trigger to activate. Only after you've reached this realization should you progress to the next phase, Decision.
Phase 3: Decision
The purpose of the decision phase is to develop recommendations that identify WHAT should be learned and WHAT needs to be done. In this phase, we are concerned with correcting or eliminating the root causes of an incident. This can only be accomplished if both learning and action occur. Learning without action is mere mental trickery, while action without learning is simply useless physical exercise. Both are required for long-term, effective results.
During the decision phase, beware of overly-specific, conditional corrective action recommendations! It is often tempting to save effort by cramming one more feature or condition into an existing mechanism. However, doing so often just adds complexity to a situation that has already shown itself to be prone to failure. Do not be afraid to recommend complete redesign in such situations.
In some situations, there may be several options available to correct or eliminate a root cause. In such cases, a structured decision analysis method should be used to gauge competing recommendations against criteria such as simplicity, effectiveness, longevity, cost, etc. However, do not forget to consider potential risks or side-effects of each recommendation as well. In correcting one set of root causes, be sure you are not creating another set of latent conditions or weaknesses that could lead to future (perhaps completely different) incidents.
Finally, once it is decided which lessons must be learned and which actions must be taken, make one final check. Evaluate the recommendations against the original incident. Ask yourself "if we had known these lessons, and had these measures in place, would the incident still have occurred?" Similarly for the root causes, ask "... would these root causes still exist?" Only when you can honestly answer "NO" to both of these questions do you have a plan that has a good chance of being effective.
Hopefully, by this point you have begun to understand why I've identified three different phases of Root Cause Analysis and why they should be kept separate. I hope this one final thought will help you understand completely: the three phases of Root Cause Analysis differ in their balances of objectivity versus subjectivity. Moving subjectivity too early into the process ultimately destroys it's integrity.
- Investigation must be completely objective, in order to expose only factual relationships.
- Analysis can be subjective, but only to the extent that different systems or organizations have different values, some of which may be contradictory or incorrect.
- Decision is subjective in that multiple options may exist to correct or eliminate root causes, and selection of the right options must be coloured by what we want our values to be in the future.
Finally, note that in this whole article, I've not taken us past the point of deciding what to do. In other words, what about actually doing? In my opinion, that's a completely different process, perhaps the subject of a future article. All I will say at this point is that the Root Cause Analysis philosophy outlined above fulfills the "Plan" portion of the "Plan-Do-Check-Adjust" cycle. Hopefully, what I've written here will help you Plan better!
For more information on tools that can be used in each of the phases discussed above, please visit my page on root cause analysis methods.
by Bill Wilson