File size: 2,294 Bytes
d02897f | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 | ---
title: OpenOps Incident Commander
emoji: π¨
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
---
# OpenOps: AI Incident Commander Environment
Production incident management environment where AI agents learn to handle real-world outages.
## Overview
OpenOps simulates production incidents requiring an AI Incident Commander to:
- Investigate alerts and logs
- Identify root causes
- Execute mitigation actions (restart/rollback/scale)
- Communicate with teams and users
- Resolve incidents quickly to minimize revenue loss
## Environment Specification
### Observation Space
```python
{
"active_alerts": List[str],
"service_status": Dict[str, str],
"recent_logs": Dict[str, List[str]],
"metrics_summary": Dict[str, float],
"customer_complaints": int,
"time_elapsed": int,
"revenue_loss": float,
"teams_notified": bool,
"status_page_updated": bool,
"user_communication_sent": bool
}
```
### Action Space (21 actions)
- 0: read_alerts
- 1-4: inspect_logs_{service}
- 5-8: check_metrics_{service}
- 9-12: restart_{service}
- 13-14: rollback_{service}
- 15-16: scale_{service}
- 17-19: Communication actions
- 20: resolve_incident
### Three Tasks
**Task 1 (Easy): Simple API Crash**
- API service down due to OOM
- Solution: Inspect logs β Restart API
**Task 2 (Medium): Bad Deployment**
- Database deployment broke queries
- Solution: Inspect logs β Rollback deployment β Notify team
**Task 3 (Hard): Cascading Failure**
- Database overload β API timeouts β Customer impact
- Solution: Inspect both services β Scale database β Restart API β Communicate
## Installation
```bash
pip install -r server/requirements.txt
```
## Usage
### Run Locally
```bash
cd server
uvicorn app:app --reload
```
### Run Inference
```bash
export OPENAI_API_KEY="your-key"
python inference.py
```
## Grading
Each task scored 0.0-1.0 based on:
- Investigation quality
- Correct mitigation actions
- Communication
- Successful resolution
## Deployment
Deploy to HuggingFace Spaces:
```bash
openenv push
```
## π Sample Output

|