Failure can lead to blame or inquiry in your organisation.

When failure leads to blame, organisations subscribe to the old view of human error. They construct a narrative that’s far worse than the reality, a narrative that focuses on a single root cause, which is inevitably human error. This reductionist and deconstructive process has us go down-and-in, treating people and systems as separate entities, with people at the root of the cause.

When failure leads to inquiry, organisations subscribe to the new view of human error. People are part of the systems, inquiry is angled up-and-out, focused on understanding the relationships and bigger picture ideas at play. This is difficult, because it involves acknowledging and embracing complexity.

When failure leads to inquiry, we embrace different perspectives, different stories, different interests - and often these contradict one another. By embracing these differences, we create an opportunity for learning for people inside the organisation, navigating the delta between how we imagine work is completed in our organisation, and how it is actually done.

Learning organisations have three distinct advantages:

  • They have feedback loops that deliver high quality feedback from the front lines,
  • Which allows people performing the work to focus on quality and delivery,
  • Which reduces the amount of defending of decisions by practitioners.

These three advantages minimise the likelihood of a Cover Your Arse culture emerging, where people focus more on implementing insulation against potential blowback from performing work, than actually performing the work itself.

I posit there are three contributing factors that inhibit learning in organisations:

  • Language we use when talking about and contextualising failure
  • Blame and the tainted narrative we construct via cognitive biases
  • Sharing of experiences in our organisations to uncover understanding

Language

The words we use when talking about events are really important.

Words are framing devices that can both expand and limit the scope of inquiry. These words are used during your investigations, retrospectives, learning reviews, brainstorming sessions, and post-mortems. But most importantly they’re used when having daily conversations with your colleagues.

Why

Why is used to force people to justify actions, to attribute and apportion blame. Why goes down-and-in, focuses the inquiry on people, and is often used to phrase counterfactuals that focus attention on a past that didn’t happen – “why didn’t you answer the page?”, “why didn’t you check the backups?”.

Why plays right into the hands of the Fundamental Attribution Error, where we explain other people’s actions by their personality, not the context they find themselves in, but we explain our own actions by our context, not our personality.

How

How is about articulating the mechanics of a situation, which is helpful for distancing people from the actions they took. How clarifies technical details - “how did the site go down?”, “how did the team react?” – but it can also limit the scope of the inquiry, as we focus on the mechanics, not the relationships at play in the larger system.

What

What uncovers reasoning, which is important for building empathy with people in complex systems – “what did you think was happening?”, “what did you do next?”. What makes it easier to point our investigations up-and-out, on the bigger picture contributing factors to an outcome. What encourages explaining in terms of foresight, and helps us take into account local rationality:

“people make what they consider to be the best decision given the information available to them at the time”

Dekker describes explaining an incident in terms of foresight as understanding what people inside the tunnel saw, as they journeyed through it during an incident. What helps us uncover what the inside of the tunnel looked like.

Blame

Blame assigns responsibility for an outcome to a person. Often we use blame to say that people were neglectful, inattentive, or derelict of duty. It plays into this idea of bad apples, amoral actors in our midst who are working against the sanctity of pristine system the dirty humans keep fucking up.

But assigning responsibility for an outcome to a person ignores a truth – sometimes bad things happen, and nobody is to blame. Furthermore, things go right more often than they go wrong.

There are two cognitive biases at play when assigning blame to people: confirmation bias, and hindsight bias.

But what is a cognitive bias? Simply, a cognitive bias is a mental shortcut your brain unconsciously takes when processing information. Your brain optimises for timeliness over accuracy when processing information, and applies heuristics to make decisions and form judgements. If those heuristics produce an incorrect result, we say that’s an example of a cognitive bias.

Confirmation bias

With the confirmation bias, we seek information that reinforces existing positions, and ignore alternative explanations. Worse still, we interpret ambiguous information in favour of our existing assumptions.

Simply put: if you are looking for a human to blame, you’re going to find one, regardless of contrary information.

We can counter the confirmation bias by appointing people to play the devils advocate and take contrarian viewpoints during conversations and investigations.

Hindsight bias

The hindsight bias alters our recollection of memories to fit a narrative of how we perceived and reacted to events. It’s a type of memory distortion where we recall events to form a judgement, and talk about and contextualise events with knowledge of the outcome – often making ourselves look better in the process.

The hindsight bias is dangerous because it can taint all your interactions with your team. It is your culture killer, altering our how we recall your perception of events and actions in stressful situations, driving a self-defensive wedge between you and you colleagues.

It’s important to eliminate hindsight bias when conducting post-mortems and investigations if we want a just outcome. The simplest way to achieve this is to explain events in terms of foresight, and this is made easier by using questions that start with “how” and “what”. Start the review at a point before the incident, and work your way forward. Resist the urge to jump ahead to the outcome and work your way back from that.

Doing this is hard and requires a lot of self-restraint and practice. You’ll make a lot of mistakes, and it takes time to get good at it. Even when you’re good at it, you’ll still occasionally find yourself slipping into old habits. It’s the responsibility of the whole team to call each other out when they see each other fall into the hindsight bias trap, using words like why and who.

We can also harness hindsight bias to give us insights into how things might break in the future.

Before you take a new service live, gather the team together and ask them to brainstorm on a whiteboard or post-it notes what they think will break when they go live. Then clear away any notes you’ve collectively taken, and ask them to imagine themselves 5 minutes after the feature has gone live. Now ask “what has just broken?”.

You’ll find the answers you get can be quite different.

Sharing

Sharing our experiences after an incident happens is vital for the organisation to learn from individual and shared experiences. By sharing our experiences we have the opportunity to embrace different and often contradictory perspectives, stories, and interests.

From these we can better understand what our organisations capabilities and weaknesses are, both when things go wrong but when things go right. This creates an opportunity to understand the delta between Work-as-Imagined and Work-as-Done in our organisations.

We do this by holding retrospectives, investigations, post-mortems, or learning reviews – but the label we apply to the event is irrelevant.

These events must be environments where people in your organisation feel they can speak their truth and experiences free of persecution or backlash. If you’re in a leadership or management position, and people in your team are participating in these sharing experiences, be the shit umbrella you want to see in the world.

Other people in your organisation will likely be skeptical of the findings (especially if there is a blameful culture of finding and singling out bad apples), so it’s your responsibility to your people to shield them from the repercussions of being honest. Again, we are all locally rational:

“people make what they consider to be the best decision given the information available to them at the time”

You have a limited window of opportunity to create an expectation that if you share there won’t be blow back - if you fuck it up early on, people will be reluctant to share anything vaguely compromising about their experiences and actions in the future, and thus the organisation as a whole suffers from the missed opportunity to learn.

Know the audience of the report you produce after you’ve shared experiences. Sometimes this means you have to construct multiple reports, one for each audience. The story you tell across these reports should be the same, but alter the level of detail for the audience who is reading it. You may also need to omit different findings for different audiences so details don’t get misconstrued.

Beware of weasel words that show up in the report:

  • “the team should have…” (counterfactual describing a past that never happened)
  • “the root cause of the outage was…” (there is never one cause, there are many contributing factors)
  • “human error lead to…” (our world is humans and systems, not humans or systems)

Creating opportunities for sharing our experiences of accidents, incidents, and outages is mandatory if we want to learn about what our organisations capabilities and weaknesses are when things go wrong.

To do this we have hold retrospectives or learning reviews or post-mortems, start at the beginning, and relentlessly eliminate our own and collective cognitive biases when talking about events, by using what and how, not why and who.

Things go right more often than they go wrong, and we owe it to ourselves and our colleagues to understand what made our course of action the right one at the time, in spite of the outcome.


This piece is a writeup of the talk I gave at Velocity Amsterdam 2015.