My previous post about root causes in complex systems, in retrospect, looks a little bit like a rant. That doesn't bother me too much, really... but I wish I had included the following info: it is one way to go about resolving the mess that complex systems can make of your root cause analysis.
I'm getting so very tired of safety/accident researchers claiming that root cause analysis is an invalid, blame-focused practice that ignores systems and complexity. Most root cause investigators that I know are pretty well oriented towards process, organization, and system issues as the fundamental sources underlying problems and accidents... and even some of our simplest analysis tools (e.g., TWIN) include specific checks for complex-system characteristics/behaviours (e.g., hidden system responses, separation between cause and effect).
I used to read a lot more than I do now... sometimes for pure enjoyment, but also for research about my favourite topics (like Root Cause Analysis, of course). Here are some of the books and reports on my root cause bookshelf right now, things I plan to read during the 4th quarter of 2014. I guess I'm in the mood for accident theory and models, and for big books!
Is root cause analysis possible for complex systems? Some would say no, claiming that such systems are intractable -- that in complex systems, there is no such thing as causality, only pattern and correlation (see Pollard). Even the language used is different. Where others would say problem, cause, and solution, they say situation, pattern, and approach. These terminology differences aside, are the two viewpoints actually incompatible? Is root cause analysis possible for complex systems?
In 1931, HW Heinrich published his findings from a review of hundreds of thousands of safety incidents. His data showed that on average, for every 300 near-miss events without injury, there would be 29 minor to moderate injuries and 1 major injury or fatality. Similar studies done since 1931 have yielded similar results. The data is deceptively, compellingly simple -- the meaning, however, is not. What is implied by a 300:29:1 ratio of near-misses to moderate injuries to major injuries? Why do we care? Is there some deeper, underlying pattern to this data?