devsecops-platform / infrastructure /postmortem-template.md
shaikhsalman's picture
refactor: merged structure - model at center, DevSecOps wrapped around it
9d4d5c7 verified

Post-Mortem: [INCIDENT TITLE]

Metadata

  • Incident ID: INC-XXXX
  • Severity: P1/P2/P3
  • Date: YYYY-MM-DD
  • Duration: X hours Y minutes
  • Start Time: HH:MM UTC
  • End Time: HH:MM UTC
  • Authors: @engineer1, @engineer2
  • Status: Draft/Final

Executive Summary

[1-2 sentences: what happened, customer impact, duration]

Impact

  • Customers affected: X / Y (Z%)
  • Requests failed: X
  • Revenue impact: $X
  • Error budget consumed: X% of 30d budget

Timeline (UTC)

Time Event Action
HH:MM Alert fired On-call paged
HH:MM Root cause identified [What was found]
HH:MM Mitigation applied [What was done]
HH:MM Service restored [Confirmation]
HH:MM All-clear Incident closed

Root Cause

[5 Whys analysis]

  1. Why did the incident occur?
  2. Why was that condition present?
  3. Why was that not caught?
  4. Why was there no automated prevention?
  5. Why was this not in our risk model?

What Went Well

  • [Detection was fast / alert was clear / etc.]

What Went Wrong

  • [Response was slow / runbook was missing / etc.]

Action Items

# Action Owner Priority Due Date Type
1 [Fix] @eng P1 YYYY-MM-DD Remediate
2 [Prevent] @eng P2 YYYY-MM-DD Automate
3 [Detect] @eng P2 YYYY-MM-DD Monitoring

Lessons Learned

  • [Key takeaway 1]
  • [Key takeaway 2]

Appendices

  • Grafana dashboard screenshots
  • Alert screenshots
  • Log excerpts