Post-Mortems¶

Incident analysis and lessons learned from operational incidents.

What Goes Here¶

Post-mortems document: - Production incidents and outages - Security incidents - Data loss or corruption events - Significant operational failures - Near-miss events worth documenting

Post-Mortem Format¶

Each post-mortem should follow this structure:

---
title: "Database Outage - November 21, 2025"
date: 2025-11-21
severity: critical
duration: 2h 15m
---

# Database Outage - November 21, 2025

## Summary
[Brief overview of what happened]

## Timeline
[Chronological sequence of events]

## Impact
[Who/what was affected and how]

## Root Cause
[What actually caused the incident]

## Resolution
[How it was fixed]

## Lessons Learned
[What we learned]

## Action Items
[Concrete follow-up tasks with owners]

Naming Convention¶

Use date prefix: YYYY-MM-DD-
Use kebab-case for the incident description
Examples:
2025-11-21-db-outage.md
2025-10-15-api-timeout-spike.md
2025-09-03-deployment-rollback.md

Post-Mortem Culture¶

Blameless - Focus on systems and processes, not individuals
Honest - Document what actually happened, not what we wish had happened
Actionable - Every post-mortem should produce concrete action items
Timely - Write within 72 hours while details are fresh

Rules¶

Never edit published post-mortems - They are historical records
Include exact dates and times in UTC
Link to related monitoring/logs where possible
Track action items separately (don't just list them and forget)