Microsoft recently experienced a significant interruption of their Azure cloud service. Since a decent amount of data was available for this incident, I decided to do a partial root cause analysis. All of my source data came from Microsoft's official Azure blog post on 2014-Nov-24.
Tag Archives: tools
In DEPTH Cause Analysis for Equipment Problems
Equipment degrades, malfunctions, or fails outright... hopefully not too frequently, but every breakdown can be painful. So, of course, you will want to figure out exactly what happened, how it happened, and why (in hardware/software terms). You may want to dig a little deeper, though; every equipment malfunction or failure that you have is a valuable datapoint for gauging the health of your overall equipment program. That's why I've started creating a tool that should be able to help find programmatic causes underneath the more easily observable equipment performance issues. Since it is intended to be a Diagnostic for Equipment Programs, and uses Tabulated Heuristics, I call it DEPTH.
Continue reading
RCA at the AntiSyphus Effect
Kathleen DeFilippo over at The AntiSyphus Effect (lost to the ravages of time, kind of, but see below) has written a very nice collection of 10 articles on root cause analysis. I especially like how the articles flow from a logical starting point to a logical conclusion... something I've never really bothered to do. She's also developed a couple of nice little tools (FERCS and the PHaTS Domino) that should be useful to anyone trying to get started in root cause analysis.
Continue reading
Rockin’ New Human Performance Investigation Tool
Most problems and accidents involve human activity at some point or other. Often, this activity is right at the point of occurrence, and people at the sharp end are usually operating under difficult or confusing circumstances. They make decisions and take actions that, in hindsight, prove to be "wrong" in some way. Then, after something "bad" happens, we perform an investigation and find these "human errors"... and all too often, we stop there without really considering all the factors that shaped the undesired outcome. This article discusses a new tool that has been developed (by me) to help investigators find these factors so they can be used as starting points for root cause analysis.
Continue reading
Systematic Problem-Solving Sequence
Problems happen all the time. How we choose to respond is a major factor in determining how badly we will be affected by any given problem. I would argue that a systematic response is best, and furthermore, I propose a 9-stage sequence (including root cause analysis) as discussed in this article.
Continue reading
Root Cause Checklists
I confess to being a list-maker. I've published a few root cause analysis related lists in this weblog since it's inception 18 months ago in May 2004. I've also posted a couple in the wiki (old wiki gone, new wiki still sparse). I thought it might be interesting to wrap them all up into a single article. So, with that introduction, here are the lists.
Continue reading
Five-by-Five Whys
Search around on the web for Root Cause Analysis, and you're likely going to find article after article discussing the Five Whys (or 5 Whys, or 5Y) technique. This method is especially popular in manufacturing, where the main concern is often productivity -- maximizing production rate and minimizing rejects. I've heard many Six Sigma and Lean practitioners talk about this as one of their favourite tools. I can understand why, too... it's easy to remember, simple to apply, and gets deeper than traditional problem solving. However, it also contains some traps.
Continue reading
Human Error
How many times have you read a news article about a major problem or incident, and seen Human Error in the headline? A quick search of Google News for that phrase provides plenty of examples. The latest big incident of note (when this article was written in 2005) is the August 12, 2005 Los Angeles blackout -- sample headline "Human Error Led to Widespread Outage." Well, what a blinding flash of the obvious -- people make mistakes!
Continue reading
Why-Trees
One of the most widely-used root cause analysis tools is the why-tree, otherwise referred to as a cause tree, root cause tree, causal factor tree, why staircase tree, cause map, etc. Many of the RCA consultants out there use some variation of this as their main investigation and analysis method. I've just posted rev 0 of an article describing the why-tree and it's usage on my root cause analysis tools page. Please have a look, and provide comments... I plan on adding some graphics later, but I want to get the text right before I do.
*Update* I've renamed the article and revised my terminology to be more generic.
Here's the article: Causal Factor Tree Analysis
![]() by Bill Wilson |
Loading Quotes...
|
Root Cause Tools
I've just posted (in June 2005) the first article in my (hopefully) continually evolving series on root cause analysis tools. This first article provides my perspectives on that venerable old standby, Barrier Analysis. I hope you like it!
![]() by Bill Wilson |
Loading Quotes...
|