At this point you’ve seen all the reasons why learning from incidents is good for you and your org. We also discussed which incidents may be suitable for a more in-depth review than others. Among these are also the incidents that could have been: the near-misses. These incidents are particularly helpful because they provide an easy on-ramp to learning. After all, they’re free from the dark cloud that often looms over the incidents that did, in fact, miss. Yet, not enough orgs take advantage of learning from their near-misses.
What is a near-miss?
“But I thought the old lady dropped it into the ocean in the end? Well baby, I went down and got it for you.”
Our friends at Verica describe a near-miss as “an incident that the organization noted had no noticeable external or customer impact, but still required intervention from the organization.”
I first started looking at near-misses years ago, when my team was trying to get a count of incidents that occurred over a specific period of time. Back then, we were keeping track of incidents to make sure our count was going down year after year which would show improvement (you can read more about the journey here). After some time we realized that some of these incidents should not be seen as a “lack of improvement.” In fact, many of these incidents showed that we had improved from previous incidents, since our customers did not feel much or any impact!
These near-misses had similar characteristics to what we usually consider an “incident”: something had happened, we needed multiple folks to collaborate on addressing it, folks needed to drop whatever it was they were doing to work on it immediately, and we needed someone to coordinate and communicate what was happening. But unlike a traditional incident, at the end of the near-miss, our end user was not impacted! Because of the hard work of those involved during the incident process, we were able to stop the wave of impact from reaching that far.
Near-misses can vary in form. Some examples include:
An error in an accounting system that was caught early enough to be fixed before invoices were sent out.
A call center’s phone system was down during off-hours. The in-house team was able to work around this outage in time for the call center to open.
In either case, due to the quick response, the system was able to complete what it needed to do. Ultimately, we don’t need to fully define near-misses since their whole point is to provide us with an opportunity to expand the universe of events we can learn from.
What can we learn from near-misses?
A near-miss can tell us as much about our systems, organizations, and work as an “original recipe” incident.
Help us understand what is important to us as an organization: who is our end-user? What do they need from us? How do we know if we are fulfilling this need?
In the call center example, our user may be the call center employees or those trying to reach them. Both groups need to be able to use the phone systems during operating hours, otherwise it’s a full-blown incident.
Tell us who the key players are for a specific system: who do we need when the system stops working? How do we work with them?
Maybe we think an incident only requires the engineers in the accounting team but we actually also need folks who can bypass controls around releasing.
Show us how we find out about incidents: what do we look at? What indicators do we pay attention to?
While the accounting system may be up, how do we check that we are getting accurate results? In the call center example, how do we differentiate between “hard down” or “degradation of call quality”?
Explain how the system works: what did we expect to happen? What happened that we didn’t expect?
By reviewing the near-miss, folks can better understand the architecture behind the telephone systems and the history surrounding it.
Highlight everything that had to happen for the incident to be a near-miss, and provide examples that can be used in other parts of the org.
Perhaps the developers in charge of the accounting system have a uniquely close relationship with the folks doing the reconciliation, and that should be more encouraged across other teams. Or maybe we had some quick workarounds ready, to make sure we can make quick changes in case of incidents. These quick workarounds can also be implemented into other processes.
How to review near-misses
We can review near-misses the same way we review any incident! You may follow the Howie process with a disclaimer for participants that, while you understand this incident did not have user-impact, there is a lot to learn from them! For your first iterations of near-miss reviews you may have to do some convincing. If so, I recommend following a more lightweight process and perhaps bypassing interviews (like Emily outlines here). Once folks in your organization see the benefit of reviewing these near-misses, they’ll be more likely to agree to invest the time into these investigations.
Near-misses are some of my favorite learning opportunities. I have found that participants are more willing to share their stories from their own point of view when they are in a celebratory mood and know that they cannot get in trouble. As Lorin Hochstein explains “it can be easier to learn from incidents with less impact because there’s less pressure from the organization to get closure and move on.” Reviewing near-misses is a great way to get started learning from incidents as it provides the psychological safety necessary for a learning culture to take hold at an organization!
Here at Jeli we believe in transparency and we’re working to conduct our own learning reviews for near-misses. You can expect to see some in this space later this year!