Patterns of Response
Problems happen all the time. How we respond ought to be a function of the problem's importance... but how do we decide what is important? More importantly, how do we decide what response is needed?
JeremyK at Microsoft examines this issue from a software development/support viewpoint; he recently wrote in his blog:
In the lifetime of a case, particularly one of high impact, a point is reached where a decision is necessary regarding the direction of the case. This decision impacts how the situation is approached going forward. The decision that must be made is: "Do I want Root Cause or Relief?" This decision is important because there are trade offs that have to be made.
On the surface, this is just a customer satisfaction issue... their software has stopped working, the outage is affecting their business, and they need it back as soon as possible. The problem is that providing the relief they seek will probably destroy most of the evidence needed to determine the root causes of the software failure, whereas holding off for a root cause analysis may lead to significant business/financial impacts. Ultimately, the customer (in consultation with the support engineer) has to decide which way to go.
On a completely different scale are events that impact the safety of human beings. Ejdl addresses just this sort of issue in her blog with an article that explores a tragedy that, for some, has been resolved:
It's been nearly 5 years since the outbreak (of E coli from an improperly fixed well, a massive rain storm and a tangle of factors including insufficient chlorination of the water that killed 7 people and sickened some 2,300 - for those just tuning in)... though the book is closed on the inquiry, jail sentences have been set, and most people think of it as over and irrelevant, from my perspective that is far from the case.
Considerable effort has been expended in the investigation of this incident. The official inquiry was very detailed, and blame has been fixed legally... but is that sufficient? This event could also be considered a customer satisfaction issue, with the customer being defined as the people of Walkerton, Ontario. Again, the choice is between root cause and relief, with relief in this case being defined as criminal sentences for the "responsible parties."
Problems happen all the time... we deal with them on a daily basis. Unfortunately, in most cases, we never get past the stage of providing relief. Once the immediate effects of a problem have been addressed, all inquiry stops -- or if it does continue, we only carry it far enough to figure out who gets the blame. There is a better way: next time you're faced with a problem to solve, ask yourself a few questions...
- What is the current, actual impact of the problem?
- What is the potential impact if the problem is not solved?
- What level of risk are we willing to live with, that is also supportable from a moral/legal/contractual viewpoint?
- What would be an acceptable outcome that balances risk, cost, and benefit?
Note that none of these questions deals with blame. While being able to fix blame may be emotionally satisfying, it does little to improve the system and has almost no bearing on preventing the occurrence of future problems. Instead, we should focus on a productive response to the problem -- a response that provides the relief that is needed now, while also preserving our ability to find root causes to the greatest extent possible.
by Bill Wilson