Postmortem a first guide

Last updated May 26, 2025 Published Jan 5, 2025

The content here is under the Attribution 4.0 International (CC BY 4.0) license

Join Our Community

Connect with developers, architects, and tech leads who share your passion for quality software development. Discuss TDD, architecture, software engineering, and more.

→ Join Slack

Postmortems are a crucial part of software development, especially in agile teams. They provide an opportunity to reflect on what went well, what didn’t, and how to improve in the future. This guide will help you understand the basics of conducting effective postmortems.

What Is a Postmortem?

A postmortem (or incident review) is a structured meeting held after a significant incident, bug, or failure in production. It’s a blameless examination of what happened, why it happened, and what we can do to prevent similar incidents in the future.

The name comes from medical terminology—just as a medical postmortem examines what caused a patient’s death, a software postmortem examines what caused a system failure. However, the goal is forward-looking: understanding root causes and implementing systemic improvements, not assigning blame to individuals.

Why Postmortems Matter

Postmortems serve several critical functions in a healthy engineering organization:

Learning from failures - Every incident is an opportunity to understand your system better. Postmortems capture this learning in a structured way before it fades from memory.

Preventing recurrence - By identifying root causes and patterns, you can implement changes that reduce the likelihood of the same failure happening again.

Psychological safety - Blameless postmortems send a clear message: we learn from failures rather than punish people. This encourages team members to report issues quickly and honestly rather than hiding them.

Shared understanding - A postmortem brings everyone affected—engineering, ops, support, product—into one conversation. Different perspectives reveal gaps in communication and assumptions.

Accountability without blame - The focus shifts from “who made the mistake?” to “what systems, processes, or tools failed us?” This accountability drives real improvements.

When to Conduct a Postmortem

Not every bug requires a postmortem. Consider holding one for:

  • Production outages - Complete or partial service unavailability
  • Data loss or corruption - Any incident involving customer data
  • Security incidents - Breaches or vulnerabilities that affected users
  • Customer impact - Issues that directly affected users or revenue
  • Near misses - Close calls that could have been catastrophic
  • Significant bugs - Major defects discovered in production, especially if they bypassed multiple safety nets

Minor bugs fixed quickly in development don’t need postmortems, but patterns of similar bugs might warrant a broader review.

Conducting an Effective Postmortem

Timing

Schedule the postmortem soon after the incident resolves—ideally within 24-48 hours while details are fresh. However, wait long enough that the team has recovered emotional energy. If the incident just finished an hour ago, people are still in crisis mode and not ready for reflection.

Participants

Include:

  • Key responders - Those who were directly involved in detecting and fixing the issue
  • Leadership - Engineering manager or architect for context (observe more than direct)
  • Cross-functional voices - Someone from ops, support, or product if relevant
  • A facilitator - Someone neutral who can guide discussion and challenge assumptions

Limit size to 5-8 people maximum. Larger groups become unwieldy and discourage honest discussion.

Essential Sections

Timeline reconstruction - Start with the facts. When was the issue first detected? When was it fully resolved? What actions were taken in between? Build a shared, objective narrative before examining causes.

Impact assessment - How many users were affected? How long did it last? Did we lose data? What was the financial impact? Clear impact helps calibrate how deeply to investigate.

Root cause analysis - This is the heart of the postmortem. Ask “why?” repeatedly but stay focused on systemic causes, not human error. Instead of “the engineer forgot to test,” ask “why didn’t our testing catch this?” or “why is manual testing our primary safety net?”

Contributing factors - Most incidents result from multiple factors aligning: perhaps inadequate monitoring + insufficient testing + unclear documentation. Identify these patterns.

Action items - These are the deliverables of a postmortem. Each action item should address one contributing factor. Make them specific, assigned, and scheduled.

What went well - Balance criticism with recognition. If on-call response was fast, say so. If monitoring helped catch it faster, highlight that. This reinforces positive behavior.

Best Practices for Blameless Postmortems

Separate the person from the action - Focus on decisions and system design, not character. Say “the deployment process didn’t catch this issue” rather than “the engineer didn’t check the tests.”

Assume good intent - People make decisions based on the information available at the time. Understand the context they were working in, not how it looks in hindsight.

Challenge the narrative - Gently push back when you hear blame language creeping in. “Help me understand why that decision made sense at the time.”

Make it psychologically safe - Someone new should feel comfortable speaking up in your postmortems. No eye rolls, no lectures after the meeting, no consequences for honest mistakes.

Document everything - Assign one person to take notes. Postmortems are only valuable if you actually remember them later. Share the writeup with the team and archive it.

Common Mistakes to Avoid

Stopping at “human error” - Yes, a human made a mistake, but why did that mistake cascade into a major incident? What system was fragile? What didn’t catch the error?

Assigning blame - The moment you blame someone, learning stops. People get defensive, withhold information, or stop admitting mistakes in the future.

Ignoring the postmortem - Conducting a postmortem without following up on action items is worse than no postmortem. It signals that you’re not serious about improvement.

Overcomplexity - Not every postmortem needs to be a formal 40-page document with statistical analysis. Keep them focused and actionable.

Forgetting the positive - Postmortems aren’t all about what went wrong. Acknowledge what your team did well under pressure.

Following Up

The postmortem doesn’t end in the meeting. Assign each action item to a specific person with a target completion date. Schedule a follow-up in a few weeks to verify that action items were completed. If an action item is too big or too vague, break it down.

Track postmortem trends. If you see the same root cause appearing in multiple postmortems, that’s a signal that systemic improvements are needed, not just individual fixes.

Conclusion

Postmortems are powerful tools for building resilient systems and healthy teams. They work best in organizations that genuinely embrace learning over blame, and that invest in following up on lessons learned. Start small—conduct one postmortem, document it well, and act on what you learn. Over time, this practice compounds into a culture where failures become stepping stones for improvement rather than sources of shame.

You also might like