aegislm / operations /incident_command_protocol.md
ACA050's picture
Upload 57 files
f2c6053 verified

AegisLM Incident Command Protocol


Overview

This document defines the incident command protocol for AegisLM operations, establishing clear roles and procedures during incident response.


Incident Command Structure

Command Roles

Role Responsibility Authority
Incident Commander (IC) Overall response coordination Full incident authority
Operations Lead Technical response Deploy fixes
Communications Lead Stakeholder updates Public communications
Liaison External coordination Partner communications
Safety Officer Safety of response team Stop unsafe actions

Incident Phases

1. Detection & Assessment

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      DETECTION & ASSESSMENT                                  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                              β”‚
β”‚  1. ALERT RECEIVED                                                         β”‚
β”‚     β”œβ”€β”€ Automated alert (monitoring)                                        β”‚
β”‚     β”œβ”€β”€ Manual report (user/staff)                                         β”‚
β”‚     └── Security alert (SIEM)                                             β”‚
β”‚                                                                              β”‚
β”‚  2. INITIAL ASSESSMENT                                                     β”‚
β”‚     β”œβ”€β”€ Confirm incident validity                                           β”‚
β”‚     β”œβ”€β”€ Determine scope and severity                                       β”‚
β”‚     β”œβ”€β”€ Identify affected systems                                           β”‚
β”‚                                                                              β”‚
β”‚  3. INCIDENT DECLARATION                                                   β”‚
β”‚     β”œβ”€β”€ Declare incident (if confirmed)                                     β”‚
β”‚     β”œβ”€β”€ Activate incident response                                          β”‚
β”‚     └── Notify incident commander                                           β”‚
β”‚                                                                              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

2. Response & Containment

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      RESPONSE & CONTAINMENT                                 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                              β”‚
β”‚  1. CONTAINMENT                                                            β”‚
β”‚     β”œβ”€β”€ Isolate affected systems                                            β”‚
β”‚     β”œβ”€β”€ Block malicious activity                                            β”‚
β”‚     β”œβ”€β”€ Preserve evidence                                                   β”‚
β”‚                                                                              β”‚
β”‚  2. ERADICATION                                                            β”‚
β”‚     β”œβ”€β”€ Remove threat                                                       β”‚
β”‚     β”œβ”€β”€ Patch vulnerabilities                                               β”‚
β”‚     β”œβ”€β”€ Reset compromised credentials                                       β”‚
β”‚                                                                              β”‚
β”‚  3. RECOVERY                                                               β”‚
β”‚     β”œβ”€β”€ Restore services                                                    β”‚
β”‚     β”œβ”€β”€ Verify system integrity                                             β”‚
β”‚     β”œβ”€β”€ Resume operations                                                   β”‚
β”‚                                                                              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

3. Post-Incident

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         POST-INCIDENT                                        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                              β”‚
β”‚  1. LESSONS LEARNED                                                        β”‚
β”‚     β”œβ”€β”€ What happened                                                       β”‚
β”‚     β”œβ”€β”€ How we responded                                                    β”‚
β”‚     └── What we can improve                                                 β”‚
β”‚                                                                              β”‚
β”‚  2. DOCUMENTATION                                                          β”‚
β”‚     β”œβ”€β”€ Timeline of events                                                  β”‚
β”‚     β”œβ”€β”€ Actions taken                                                       β”‚
β”‚     └── Evidence collected                                                  β”‚
β”‚                                                                              β”‚
β”‚  3. PROCESS IMPROVEMENT                                                    β”‚
β”‚     β”œβ”€β”€ Update runbooks                                                     β”‚
β”‚     β”œβ”€β”€ Enhance detection                                                   β”‚
β”‚     └── Improve response                                                    β”‚
β”‚                                                                              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Severity Levels

Severity Definition Examples Response Time
SEV1 - Critical Complete service loss, data breach Full outage, exfiltration 15 min
SEV2 - High Major feature broken API down, certification failure 1 hour
SEV3 - Medium Feature degraded Slow response, partial outage 4 hours
SEV4 - Low Minor issue UI bug, documentation error 24 hours

Communication Protocol

Internal Communication

Stage Channel Audience Timing
Detection PagerDuty On-call Immediate
Declaration Slack #incidents Response team 15 min
Updates Slack #incidents All hands Hourly
Resolution Slack #incidents All hands On resolution

External Communication

Stage Channel Audience Approval
Initial Status page Public IC only
Updates Status page Public IC + Comms
Post-Incident Blog/Report Public Advisory Board

Runbook Integration

Common Incident Runbooks

Incident Type Runbook Location Status
API Outage runbooks/api-outage.md βœ“ Complete
Database Failure runbooks/db-failure.md βœ“ Complete
Security Breach runbooks/security-breach.md βœ“ Complete
Certification Error runbooks/cert-error.md βœ“ Complete
Data Loss runbooks/data-loss.md βœ“ Complete

Version Information

Item Version Date
Incident Command Protocol 1.0 January 15, 2025

This protocol is maintained by the Operations team and reviewed quarterly.