Skip to content

Post-Mortems

Incident analysis and lessons learned from operational incidents.

What Goes Here

Post-mortems document: - Production incidents and outages - Security incidents - Data loss or corruption events - Significant operational failures - Near-miss events worth documenting

Post-Mortem Format

Each post-mortem should follow this structure:

---
title: "Database Outage - November 21, 2025"
date: 2025-11-21
severity: critical
duration: 2h 15m
---

# Database Outage - November 21, 2025

## Summary
[Brief overview of what happened]

## Timeline
[Chronological sequence of events]

## Impact
[Who/what was affected and how]

## Root Cause
[What actually caused the incident]

## Resolution
[How it was fixed]

## Lessons Learned
[What we learned]

## Action Items
[Concrete follow-up tasks with owners]

Naming Convention

  • Use date prefix: YYYY-MM-DD-
  • Use kebab-case for the incident description
  • Examples:
  • 2025-11-21-db-outage.md
  • 2025-10-15-api-timeout-spike.md
  • 2025-09-03-deployment-rollback.md

Post-Mortem Culture

  • Blameless - Focus on systems and processes, not individuals
  • Honest - Document what actually happened, not what we wish had happened
  • Actionable - Every post-mortem should produce concrete action items
  • Timely - Write within 72 hours while details are fresh

Rules

  1. Never edit published post-mortems - They are historical records
  2. Include exact dates and times in UTC
  3. Link to related monitoring/logs where possible
  4. Track action items separately (don't just list them and forget)