Post-Mortems¶
Incident analysis and lessons learned from operational incidents.
What Goes Here¶
Post-mortems document: - Production incidents and outages - Security incidents - Data loss or corruption events - Significant operational failures - Near-miss events worth documenting
Post-Mortem Format¶
Each post-mortem should follow this structure:
---
title: "Database Outage - November 21, 2025"
date: 2025-11-21
severity: critical
duration: 2h 15m
---
# Database Outage - November 21, 2025
## Summary
[Brief overview of what happened]
## Timeline
[Chronological sequence of events]
## Impact
[Who/what was affected and how]
## Root Cause
[What actually caused the incident]
## Resolution
[How it was fixed]
## Lessons Learned
[What we learned]
## Action Items
[Concrete follow-up tasks with owners]
Naming Convention¶
- Use date prefix:
YYYY-MM-DD- - Use kebab-case for the incident description
- Examples:
2025-11-21-db-outage.md2025-10-15-api-timeout-spike.md2025-09-03-deployment-rollback.md
Post-Mortem Culture¶
- Blameless - Focus on systems and processes, not individuals
- Honest - Document what actually happened, not what we wish had happened
- Actionable - Every post-mortem should produce concrete action items
- Timely - Write within 72 hours while details are fresh
Rules¶
- Never edit published post-mortems - They are historical records
- Include exact dates and times in UTC
- Link to related monitoring/logs where possible
- Track action items separately (don't just list them and forget)