Spaces:
Sleeping
Sleeping
File size: 9,491 Bytes
06b4790 c77904d 06b4790 c77904d 06b4790 c77904d 0c2e366 06b4790 c77904d 303f7be e65afd0 c77904d bdd0439 c77904d bdd0439 c77904d 5e9ab6b e65afd0 5e9ab6b c77904d 5e9ab6b c77904d bdd0439 c77904d bdd0439 c77904d 5e9ab6b e65afd0 c77904d e65afd0 c77904d bdd0439 c77904d e65afd0 c77904d e65afd0 c77904d e65afd0 bdd0439 e65afd0 5e9ab6b e490eac 06b4790 bdd0439 77eea12 bdd0439 77eea12 bdd0439 77eea12 bdd0439 06b4790 c77904d e65afd0 c77904d bdd0439 06b4790 bdd0439 c77904d bdd0439 c77904d bdd0439 c77904d bdd0439 c77904d bdd0439 c77904d bdd0439 c77904d bdd0439 06b4790 bdd0439 06b4790 c77904d bdd0439 c77904d 06b4790 bdd0439 c77904d bdd0439 c77904d bdd0439 06b4790 c77904d bdd0439 c77904d bdd0439 c77904d bdd0439 c77904d bdd0439 c77904d bdd0439 c77904d bdd0439 c77904d 06b4790 c77904d 06b4790 c77904d e65afd0 bdd0439 c77904d bdd0439 c77904d bdd0439 230f8d5 bdd0439 c77904d bdd0439 06b4790 bdd0439 06b4790 bdd0439 06b4790 bdd0439 c77904d bdd0439 e65afd0 bdd0439 e65afd0 bdd0439 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 | ---
title: ARIA DevOps Incident Response
emoji: π¨
colorFrom: blue
colorTo: red
sdk: docker
pinned: true
license: apache-2.0
tags:
- openenv
- reinforcement-learning
- devops
- incident-response
- rl-environment
- multi-agent
- llm-agent
- grpo
- curriculum-learning
- huggingface
- pytorch
- meta
short_description: "OpenEnv RL for incident response. 7 tasks, Llama-3.1-8B"
---
# ARIA β DevOps Incident Response
### *The first OpenEnv RL environment for production incident response*
[](https://colab.research.google.com/github/Twilight-13/devops-incident-response/blob/main/train_grpo.ipynb)
[](https://huggingface.co/spaces/Arijit-07/devops-incident-response)
[](https://huggingface.co/Arijit-07/aria-devops-llama8b)
[](LICENSE)
> **ARIA** β Adaptive Reward & Incident Architecture
> Built for the Meta Γ PyTorch Γ HuggingFace OpenEnv Hackathon Finals | Bangalore, April 2026
---
## π Quick Links for Judges
| Resource | Link |
|---|---|
| **Live Environment** | https://arijit-07-devops-incident-response.hf.space |
| **Interactive API** | https://arijit-07-devops-incident-response.hf.space/docs |
| **Trained Model (8B)** | https://huggingface.co/Arijit-07/aria-devops-llama8b |
| **Training Curve** | https://huggingface.co/Arijit-07/aria-devops-llama8b/resolve/main/training_curve_8b.png |
| **Blog Post** | https://huggingface.co/blog/Arijit-07/aria-devops-incident-response |
| **GitHub** | https://github.com/Twilight-13/devops-incident-response |
| **Validate** | https://arijit-07-devops-incident-response.hf.space/validate |
| **About (machine-readable)** | https://arijit-07-devops-incident-response.hf.space/about |
---
## β‘ Run a Complete Episode Right Now
```bash
# 1. Start an easy incident
curl -X POST https://arijit-07-devops-incident-response.hf.space/reset \
-H "Content-Type: application/json" \
-d '{"task_id": "easy", "seed": 42}'
# 2. Read logs on the failing service (reward: +0.15)
curl -X POST https://arijit-07-devops-incident-response.hf.space/step \
-H "Content-Type: application/json" \
-d '{"action_type": "read_logs", "service": "payment-service"}'
# 3. Diagnose (reward: +0.30)
curl -X POST https://arijit-07-devops-incident-response.hf.space/step \
-H "Content-Type: application/json" \
-d '{"action_type": "diagnose", "root_cause": "memory leak in payment-service"}'
# 4. Fix it (reward: +0.40)
curl -X POST https://arijit-07-devops-incident-response.hf.space/step \
-H "Content-Type: application/json" \
-d '{"action_type": "restart_service", "service": "payment-service"}'
# 5. Validate all 7 tasks pass
curl https://arijit-07-devops-incident-response.hf.space/validate
```
---
## π― The Problem
Every company running microservices faces the same reality: **production incidents are expensive, stressful, and happen at 3am.**
SWE-bench tests code generation. WebArena tests web navigation. Nothing trains agents to handle live production incidents β to read logs strategically, trace cascading failures, correlate subtle business anomalies, and apply precise fixes where wrong choices cause collateral damage.
**ARIA fills that gap.**
---
## π¬ The 7 Tasks
| Task | Max Steps | Random | Strong LLM | Scenario |
|---|---|---|---|---|
| `easy` | 15 | 0.05 | 0.85β1.00 | Single service OOM crash-loop |
| `medium` | 20 | 0.03 | 0.55β0.75 | Cascading failure + red herring alert |
| `hard` | 25 | 0.01 | 0.30β0.50 | **Silent** corruption β all services green |
| `bonus` | 25 | 0.01 | 0.35β0.55 | Two simultaneous independent failures |
| `security` | 20 | 0.01 | 0.40β0.60 | DDoS botnet credential stuffing |
| `database` | 20 | 0.01 | 0.45β0.65 | Missing index β full table scans |
| `failover` | 25 | 0.01 | 0.35β0.55 | Multi-region network partition |
| `generated` | 20 | 0.01 | variable | Procedural β seed-deterministic |
---
## π Reward Function
```
Final Score = Ξ£(step_rewards)
+ efficiency_bonus # (1 - steps/max_steps) Γ 0.05
+ diagnosis_precision # +0.03 if β₯50% keyword overlap
- noop_penalty # (noops - 3) Γ 0.02
```
Clamped to **(0.001, 0.999)** for GRPO stability.
| Action | Reward | Penalty Triggers |
|---|---|---|
| `read_logs` correct | +0.15 | Restart healthy service: **-0.15** |
| `diagnose` full match | +0.35 | Fix without diagnosing: **-0.10** |
| `restart_service` correct | +0.45 | Wrong failover (payment): **-0.25** |
| `block_ip_range` | +0.40 | Excessive noops: **-0.04 each** |
| `alert_oncall` (required) | +0.15 | |
**Semantic matching:** keyword overlap not exact string β LLMs that paraphrase aren't penalized.
---
## π ARIA Features
### Curriculum Engine
Rolling average per task (last 5 episodes). Promotes when avg > 0.75. Scaffolds with hints when avg < 0.30. Agents always train at the edge of their capability.
```bash
GET /curriculum/status
GET /curriculum/next
POST /curriculum/record # {"task_id": "easy", "score": 0.85}
```
### Incident Generator
Seeds 0β99,999 β unique reproducible incidents. 6 failure modes Γ 8 services Γ 3 severities Γ 0β3 noise alerts.
```bash
GET /generate/preview?seed=1337
POST /reset # {"task_id": "generated", "seed": 1337}
```
### Dual-Agent Mode
Split observability. Agent A (Observer) sees logs and alerts. Agent B (Responder) sees metrics and dependencies. They coordinate via `share_finding`. Neither can solve the incident alone.
```bash
POST /multi-agent/reset # {"task_id": "easy", "seed": 42}
POST /multi-agent/step/a/{id} # {"finding": "order-service OOM"}
POST /multi-agent/step/b/{id} # {"action_type": "restart_service", ...}
```
---
## π§ Training Results
**Model:** [Arijit-07/aria-devops-llama8b](https://huggingface.co/Arijit-07/aria-devops-llama8b)
| Task | Baseline | Fine-tuned | **Improvement** |
|---|---|---|---|
| easy | 0.320 | 0.685 | **+0.365** |
| medium | 0.050 | 0.378 | **+0.328** |
| hard | 0.190 | 0.869 | **+0.679** |
| bonus | 0.152 | 0.682 | **+0.530** |

**Setup:** GRPO Β· Llama-3.1-8B Β· LoRA rank=32 Β· 160 episodes Β· NVIDIA L4 Β· 162 minutes Β· Unsloth + HuggingFace TRL
**Key fix:** Group completions scored on fresh environment snapshots β prevents reward gate exhaustion during GRPO group generation.
[](https://colab.research.google.com/github/Twilight-13/devops-incident-response/blob/main/train_grpo.ipynb)
---
## π‘ API Reference
| Method | Endpoint | Description |
|---|---|---|
| GET | `/health` | Liveness check |
| GET | `/about` | Full machine-readable description |
| GET | `/tasks` | All 8 tasks |
| POST | `/reset` | Start episode |
| POST | `/step` | Take action |
| GET | `/state` | Full state + ground truth |
| GET | `/validate` | Self-test all 7 tasks |
| GET | `/metrics` | Aggregate statistics |
| GET | `/leaderboard` | Top 10 episodes |
| WS | `/ws` | WebSocket real-time |
| GET | `/curriculum/status` | Per-task mastery |
| GET | `/curriculum/next` | Recommended task |
| POST | `/curriculum/record` | Feed training results |
| GET | `/generate/preview` | Preview procedural incident |
| POST | `/multi-agent/reset` | Start dual-agent session |
| POST | `/multi-agent/step/a/{id}` | Agent A shares finding |
| POST | `/multi-agent/step/b/{id}` | Agent B takes action |
| GET | `/live` | Live NOC dashboard (real-time) |
| GET | `/challenge` | Human vs Agent challenge |
| GET | `/progress` | Score progression visualization |
| GET | `/replays` | Episode replay list |
| GET | `/replay/{id}` | Full episode replay |
| GET | `/replay/{id}/html` | Replay HTML viewer |
| GET | `/docs` | Swagger UI |
---
## π Benchmark Comparison
| Benchmark | Domain | Partial Obs | Dense Reward | Curriculum | Multi-Agent |
|---|---|---|---|---|---|
| SWE-bench | Code repair | β | β | β | β |
| WebArena | Web navigation | β | β | β | β |
| AgentBench | General tools | β | β | β | β |
| **ARIA** | **Incident response** | **β** | **β** | **β** | **β** |
---
## π Setup
```bash
docker build -t aria-devops-incident .
docker run -p 7860:7860 aria-devops-incident
# Or local
pip install -r requirements.txt
uvicorn api:app --host 0.0.0.0 --port 7860
```
---
## π Structure
```
βββ api.py / server/app.py # FastAPI β all endpoints
βββ env.py # Environment dispatcher
βββ models.py # Pydantic models
βββ tasks/ # 7 tasks + generated
βββ curriculum/engine.py # Adaptive difficulty
βββ generator/ # Procedural incidents
βββ multi_agent/session.py # Dual-agent mode
βββ graders/grader.py # Deterministic grader
βββ demo_llm.py # Live terminal demo
βββ train_grpo.ipynb # Training notebook
βββ BLOG.md # Project story
βββ openenv.yaml # OpenEnv manifest
```
Apache 2.0 Β· *Built solo for the Meta Γ PyTorch Γ HuggingFace OpenEnv Hackathon Finals β Bangalore, April 2026*
|