Root Cause Analysis for Complex Systems?
Is root cause analysis possible for complex systems? Some would say no, claiming that such systems are intractable -- that in complex systems, there is no such thing as causality, only pattern and correlation (see Pollard). Even the language used is different. Where others would say problem, cause, and solution, they say situation, pattern, and approach. These terminology differences aside, are the two viewpoints actually incompatible? Is root cause analysis possible for complex systems?
Causation itself can be a difficult subject, even for problems that are simply complicated as opposed to complex. Many causes are not deterministic in nature, but merely change the probability that an effect will occur. Also, it is rare that any single Causal Factor is fully sufficient to create an effect by itself -- and factors sometimes interact in unexpected ways. Finally, consider that in some cases, it is not possible to measure how factors behave without disturbing how they operate, distorting the measurement. Taken together, these characteristics introduce serious complications into the analysis of causation for any problem that is not simple.
Complexity is another difficult subject, partly because the concepts are fairly new, but also because the ideas seem so foreign compared to the way most of us have been taught the world works. The basic idea is that everything within a complex system is interconnected, so that a small change in one location could have effects throughout the system. It may also seem that these effects are disproportionate in size compared to the size of the change itself.
Another feature of complex systems is the nature of feedback, and it's effect in creating emergent behaviour. In typical engineered systems, the concept of feedback is well-known and exploited to control processes. However, in complex systems, it can be difficult to determine all the effects of feedback. Completely unexpected behaviour is one potential result; this is often called emergence.
Now, combine the difficulties of determining cause (even for seemingly simple problems) with the non-linear behaviour of complex systems. It quickly becomes clear that sorting out causes from effects in such a system can be difficult, because one change in one parameter may affect tens, or hundreds, or thousands of others, some of which may eventually feed back to the same point. Predicting the behaviour of such a system may seem impossible, depending on the number of parameters and the nature of the interconnections.
Here is the real question then... is it possible to analyze such a system? Are there such things as root causes in a complex system?
I say the answer is YES to both questions. I am now going to attack the "anti" argument piece by piece.
First, there is the idea that cause and effect does not exist in a complex system. I believe this is wrong. At a microscopic level, every single change has immediate effects that can be seen. They may be difficult to trace beyond one step, but they do exist. There's no escaping causality. Furthermore, it is often not necessary to trace the detailed flow of cause-and-effect within a system (or sub-system) if what you are concerned with is its behaviour relative to other entities at the same level of scale.
Second, there is the idea that in real complex systems, there are too many variables to track. In fact, this may be true... but who said you had to model every single entity in a system in order to understand how it works? In many domains, it is possible to reduce the analytical burden considerably by moving to abstract models for whole systems (or sub-systems), which can themselves be viewed as entities. This may not provide the same level of precision in analysis, but it can be quite accurate. Also, see the point above about level of scale.
Third, there is the idea that complex systems are unpredictable. I say this is inaccurate. We may not be able to predict the details of how and when a specific problem will occur, and what all it's effects might be. However, it is quite likely that we will be able to predict the likelihood and magnitude (i.e., risk) for important classes of problems.
Fourth, there is the unstated assumption that an inability to predict is the same as an inability to analyze. The predictability issue was dealt with in the previous paragraph. Furthermore, analysis is not the same thing as prediction. A retrospective analysis of a problem that has already occurred relies on evidence and investigation to characterize how a problem took place. There is no prediction involved, only detective work.
Finally, there is the assertion that there are no root causes to be found in complex systems. This might be true if your definition of root cause is something like "the original cause for a noncompliance", or "the cause, which if removed, precludes recurrence." However, the best evidence I have found to date indicates that the definition of root cause ought to be something like "the system features that increased the risk of the occurrence." In the case of complex systems, the only real causes ARE root causes, and these root causes are related to the design of the system and it's interfaces, internal and external.
Now, have I said that root cause analysis of complex systems is easy? No, I have not. I have only said it is possible. Furthermore, I will even go so far as to say that it is probably difficult. That doesn't mean we should give up. All that is required is a shift in focus.
Don't try to zero in on specific factors that lead to a specific kind of problem... focus instead on understanding how the system operates as a whole, and how the design and operation of that system affects risk. Finally, please don't go implement some flavour-of-the-month strategy to fix your problems without understanding what the causes are.
Comments and discussion are welcome.
by Bill Wilson