# AegisLM Incident Command Protocol --- ## Overview This document defines the incident command protocol for AegisLM operations, establishing clear roles and procedures during incident response. --- ## Incident Command Structure ### Command Roles | Role | Responsibility | Authority | |------|---------------|-----------| | Incident Commander (IC) | Overall response coordination | Full incident authority | | Operations Lead | Technical response | Deploy fixes | | Communications Lead | Stakeholder updates | Public communications | | Liaison | External coordination | Partner communications | | Safety Officer | Safety of response team | Stop unsafe actions | --- ## Incident Phases ### 1. Detection & Assessment ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ DETECTION & ASSESSMENT │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ 1. ALERT RECEIVED │ │ ├── Automated alert (monitoring) │ │ ├── Manual report (user/staff) │ │ └── Security alert (SIEM) │ │ │ │ 2. INITIAL ASSESSMENT │ │ ├── Confirm incident validity │ │ ├── Determine scope and severity │ │ ├── Identify affected systems │ │ │ │ 3. INCIDENT DECLARATION │ │ ├── Declare incident (if confirmed) │ │ ├── Activate incident response │ │ └── Notify incident commander │ │ │ └─────────────────────────────────────────────────────────────────────────────┘ ``` ### 2. Response & Containment ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ RESPONSE & CONTAINMENT │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ 1. CONTAINMENT │ │ ├── Isolate affected systems │ │ ├── Block malicious activity │ │ ├── Preserve evidence │ │ │ │ 2. ERADICATION │ │ ├── Remove threat │ │ ├── Patch vulnerabilities │ │ ├── Reset compromised credentials │ │ │ │ 3. RECOVERY │ │ ├── Restore services │ │ ├── Verify system integrity │ │ ├── Resume operations │ │ │ └─────────────────────────────────────────────────────────────────────────────┘ ``` ### 3. Post-Incident ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ POST-INCIDENT │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ 1. LESSONS LEARNED │ │ ├── What happened │ │ ├── How we responded │ │ └── What we can improve │ │ │ │ 2. DOCUMENTATION │ │ ├── Timeline of events │ │ ├── Actions taken │ │ └── Evidence collected │ │ │ │ 3. PROCESS IMPROVEMENT │ │ ├── Update runbooks │ │ ├── Enhance detection │ │ └── Improve response │ │ │ └─────────────────────────────────────────────────────────────────────────────┘ ``` --- ## Severity Levels | Severity | Definition | Examples | Response Time | |----------|------------|----------|---------------| | **SEV1 - Critical** | Complete service loss, data breach | Full outage, exfiltration | 15 min | | **SEV2 - High** | Major feature broken | API down, certification failure | 1 hour | | **SEV3 - Medium** | Feature degraded | Slow response, partial outage | 4 hours | | **SEV4 - Low** | Minor issue | UI bug, documentation error | 24 hours | --- ## Communication Protocol ### Internal Communication | Stage | Channel | Audience | Timing | |-------|---------|----------|--------| | Detection | PagerDuty | On-call | Immediate | | Declaration | Slack #incidents | Response team | 15 min | | Updates | Slack #incidents | All hands | Hourly | | Resolution | Slack #incidents | All hands | On resolution | ### External Communication | Stage | Channel | Audience | Approval | |-------|---------|----------|----------| | Initial | Status page | Public | IC only | | Updates | Status page | Public | IC + Comms | | Post-Incident | Blog/Report | Public | Advisory Board | --- ## Runbook Integration ### Common Incident Runbooks | Incident Type | Runbook Location | Status | |---------------|------------------|--------| | API Outage | runbooks/api-outage.md | ✓ Complete | | Database Failure | runbooks/db-failure.md | ✓ Complete | | Security Breach | runbooks/security-breach.md | ✓ Complete | | Certification Error | runbooks/cert-error.md | ✓ Complete | | Data Loss | runbooks/data-loss.md | ✓ Complete | --- ## Version Information | Item | Version | Date | |------|---------|------| | Incident Command Protocol | 1.0 | January 15, 2025 | --- *This protocol is maintained by the Operations team and reviewed quarterly.*