File size: 9,491 Bytes
06b4790
c77904d
 
 
 
 
 
 
06b4790
 
c77904d
06b4790
 
c77904d
 
 
 
 
 
 
 
0c2e366
06b4790
 
c77904d
 
303f7be
e65afd0
c77904d
bdd0439
c77904d
 
bdd0439
c77904d
5e9ab6b
e65afd0
5e9ab6b
c77904d
5e9ab6b
c77904d
 
 
bdd0439
 
 
 
c77904d
bdd0439
 
c77904d
 
 
 
5e9ab6b
e65afd0
c77904d
e65afd0
 
c77904d
 
 
 
 
 
 
bdd0439
c77904d
 
 
e65afd0
c77904d
e65afd0
 
c77904d
e65afd0
bdd0439
e65afd0
5e9ab6b
e490eac
06b4790
 
bdd0439
77eea12
bdd0439
77eea12
bdd0439
77eea12
bdd0439
06b4790
c77904d
e65afd0
c77904d
 
bdd0439
 
 
 
 
 
 
 
 
 
06b4790
 
 
bdd0439
c77904d
 
 
bdd0439
 
 
c77904d
 
bdd0439
c77904d
bdd0439
c77904d
bdd0439
 
 
 
 
c77904d
bdd0439
c77904d
 
 
 
 
 
bdd0439
06b4790
 
bdd0439
 
 
06b4790
 
c77904d
bdd0439
c77904d
06b4790
bdd0439
 
c77904d
 
 
bdd0439
c77904d
 
bdd0439
 
 
06b4790
 
c77904d
 
bdd0439
c77904d
bdd0439
c77904d
bdd0439
 
 
 
 
 
c77904d
bdd0439
c77904d
bdd0439
c77904d
bdd0439
c77904d
 
 
06b4790
 
c77904d
06b4790
c77904d
e65afd0
bdd0439
 
 
 
 
 
 
 
c77904d
bdd0439
 
 
 
 
c77904d
bdd0439
 
230f8d5
 
 
 
 
 
bdd0439
c77904d
 
 
 
 
bdd0439
 
 
 
 
 
06b4790
 
 
bdd0439
06b4790
 
bdd0439
 
06b4790
bdd0439
 
 
 
c77904d
 
 
bdd0439
e65afd0
 
bdd0439
 
 
 
 
 
 
 
 
 
 
 
e65afd0
 
bdd0439
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
---
title: ARIA DevOps Incident Response
emoji: 🚨
colorFrom: blue
colorTo: red
sdk: docker
pinned: true
license: apache-2.0
tags:
  - openenv
  - reinforcement-learning
  - devops
  - incident-response
  - rl-environment
  - multi-agent
  - llm-agent
  - grpo
  - curriculum-learning
  - huggingface
  - pytorch
  - meta
short_description: "OpenEnv RL for incident response. 7 tasks, Llama-3.1-8B"
---

# ARIA β€” DevOps Incident Response
### *The first OpenEnv RL environment for production incident response*

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Twilight-13/devops-incident-response/blob/main/train_grpo.ipynb)
[![HF Space](https://img.shields.io/badge/πŸ€—-Live%20Environment-orange)](https://huggingface.co/spaces/Arijit-07/devops-incident-response)
[![Trained Model](https://img.shields.io/badge/πŸ€—-Llama--3.1--8B%20Fine--tuned-blue)](https://huggingface.co/Arijit-07/aria-devops-llama8b)
[![License](https://img.shields.io/badge/License-Apache_2.0-green.svg)](LICENSE)

> **ARIA** β€” Adaptive Reward & Incident Architecture
> Built for the Meta Γ— PyTorch Γ— HuggingFace OpenEnv Hackathon Finals | Bangalore, April 2026

---

## πŸ”— Quick Links for Judges

| Resource | Link |
|---|---|
| **Live Environment** | https://arijit-07-devops-incident-response.hf.space |
| **Interactive API** | https://arijit-07-devops-incident-response.hf.space/docs |
| **Trained Model (8B)** | https://huggingface.co/Arijit-07/aria-devops-llama8b |
| **Training Curve** | https://huggingface.co/Arijit-07/aria-devops-llama8b/resolve/main/training_curve_8b.png |
| **Blog Post** | https://huggingface.co/blog/Arijit-07/aria-devops-incident-response |
| **GitHub** | https://github.com/Twilight-13/devops-incident-response |
| **Validate** | https://arijit-07-devops-incident-response.hf.space/validate |
| **About (machine-readable)** | https://arijit-07-devops-incident-response.hf.space/about |

---

## ⚑ Run a Complete Episode Right Now

```bash
# 1. Start an easy incident
curl -X POST https://arijit-07-devops-incident-response.hf.space/reset \
  -H "Content-Type: application/json" \
  -d '{"task_id": "easy", "seed": 42}'

# 2. Read logs on the failing service (reward: +0.15)
curl -X POST https://arijit-07-devops-incident-response.hf.space/step \
  -H "Content-Type: application/json" \
  -d '{"action_type": "read_logs", "service": "payment-service"}'

# 3. Diagnose (reward: +0.30)
curl -X POST https://arijit-07-devops-incident-response.hf.space/step \
  -H "Content-Type: application/json" \
  -d '{"action_type": "diagnose", "root_cause": "memory leak in payment-service"}'

# 4. Fix it (reward: +0.40)
curl -X POST https://arijit-07-devops-incident-response.hf.space/step \
  -H "Content-Type: application/json" \
  -d '{"action_type": "restart_service", "service": "payment-service"}'

# 5. Validate all 7 tasks pass
curl https://arijit-07-devops-incident-response.hf.space/validate
```

---

## 🎯 The Problem

Every company running microservices faces the same reality: **production incidents are expensive, stressful, and happen at 3am.**

SWE-bench tests code generation. WebArena tests web navigation. Nothing trains agents to handle live production incidents β€” to read logs strategically, trace cascading failures, correlate subtle business anomalies, and apply precise fixes where wrong choices cause collateral damage.

**ARIA fills that gap.**

---

## 🎬 The 7 Tasks

| Task | Max Steps | Random | Strong LLM | Scenario |
|---|---|---|---|---|
| `easy` | 15 | 0.05 | 0.85–1.00 | Single service OOM crash-loop |
| `medium` | 20 | 0.03 | 0.55–0.75 | Cascading failure + red herring alert |
| `hard` | 25 | 0.01 | 0.30–0.50 | **Silent** corruption β€” all services green |
| `bonus` | 25 | 0.01 | 0.35–0.55 | Two simultaneous independent failures |
| `security` | 20 | 0.01 | 0.40–0.60 | DDoS botnet credential stuffing |
| `database` | 20 | 0.01 | 0.45–0.65 | Missing index β€” full table scans |
| `failover` | 25 | 0.01 | 0.35–0.55 | Multi-region network partition |
| `generated` | 20 | 0.01 | variable | Procedural β€” seed-deterministic |

---

## πŸ† Reward Function

```
Final Score = Ξ£(step_rewards)
            + efficiency_bonus     # (1 - steps/max_steps) Γ— 0.05
            + diagnosis_precision  # +0.03 if β‰₯50% keyword overlap
            - noop_penalty         # (noops - 3) Γ— 0.02
```

Clamped to **(0.001, 0.999)** for GRPO stability.

| Action | Reward | Penalty Triggers |
|---|---|---|
| `read_logs` correct | +0.15 | Restart healthy service: **-0.15** |
| `diagnose` full match | +0.35 | Fix without diagnosing: **-0.10** |
| `restart_service` correct | +0.45 | Wrong failover (payment): **-0.25** |
| `block_ip_range` | +0.40 | Excessive noops: **-0.04 each** |
| `alert_oncall` (required) | +0.15 | |

**Semantic matching:** keyword overlap not exact string β€” LLMs that paraphrase aren't penalized.

---

## 🌟 ARIA Features

### Curriculum Engine
Rolling average per task (last 5 episodes). Promotes when avg > 0.75. Scaffolds with hints when avg < 0.30. Agents always train at the edge of their capability.

```bash
GET /curriculum/status
GET /curriculum/next
POST /curriculum/record  # {"task_id": "easy", "score": 0.85}
```

### Incident Generator
Seeds 0–99,999 β†’ unique reproducible incidents. 6 failure modes Γ— 8 services Γ— 3 severities Γ— 0–3 noise alerts.

```bash
GET /generate/preview?seed=1337
POST /reset  # {"task_id": "generated", "seed": 1337}
```

### Dual-Agent Mode
Split observability. Agent A (Observer) sees logs and alerts. Agent B (Responder) sees metrics and dependencies. They coordinate via `share_finding`. Neither can solve the incident alone.

```bash
POST /multi-agent/reset    # {"task_id": "easy", "seed": 42}
POST /multi-agent/step/a/{id}  # {"finding": "order-service OOM"}
POST /multi-agent/step/b/{id}  # {"action_type": "restart_service", ...}
```

---

## 🧠 Training Results

**Model:** [Arijit-07/aria-devops-llama8b](https://huggingface.co/Arijit-07/aria-devops-llama8b)

| Task | Baseline | Fine-tuned | **Improvement** |
|---|---|---|---|
| easy | 0.320 | 0.685 | **+0.365** |
| medium | 0.050 | 0.378 | **+0.328** |
| hard | 0.190 | 0.869 | **+0.679** |
| bonus | 0.152 | 0.682 | **+0.530** |

![Training Curve](https://huggingface.co/Arijit-07/aria-devops-llama8b/resolve/main/training_curve_8b.png)

**Setup:** GRPO Β· Llama-3.1-8B Β· LoRA rank=32 Β· 160 episodes Β· NVIDIA L4 Β· 162 minutes Β· Unsloth + HuggingFace TRL

**Key fix:** Group completions scored on fresh environment snapshots β€” prevents reward gate exhaustion during GRPO group generation.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Twilight-13/devops-incident-response/blob/main/train_grpo.ipynb)

---

## πŸ“‘ API Reference

| Method | Endpoint | Description |
|---|---|---|
| GET | `/health` | Liveness check |
| GET | `/about` | Full machine-readable description |
| GET | `/tasks` | All 8 tasks |
| POST | `/reset` | Start episode |
| POST | `/step` | Take action |
| GET | `/state` | Full state + ground truth |
| GET | `/validate` | Self-test all 7 tasks |
| GET | `/metrics` | Aggregate statistics |
| GET | `/leaderboard` | Top 10 episodes |
| WS | `/ws` | WebSocket real-time |
| GET | `/curriculum/status` | Per-task mastery |
| GET | `/curriculum/next` | Recommended task |
| POST | `/curriculum/record` | Feed training results |
| GET | `/generate/preview` | Preview procedural incident |
| POST | `/multi-agent/reset` | Start dual-agent session |
| POST | `/multi-agent/step/a/{id}` | Agent A shares finding |
| POST | `/multi-agent/step/b/{id}` | Agent B takes action |
| GET | `/live` | Live NOC dashboard (real-time) |
| GET | `/challenge` | Human vs Agent challenge |
| GET | `/progress` | Score progression visualization |
| GET | `/replays` | Episode replay list |
| GET | `/replay/{id}` | Full episode replay |
| GET | `/replay/{id}/html` | Replay HTML viewer |
| GET | `/docs` | Swagger UI |

---

## πŸ“Š Benchmark Comparison

| Benchmark | Domain | Partial Obs | Dense Reward | Curriculum | Multi-Agent |
|---|---|---|---|---|---|
| SWE-bench | Code repair | βœ— | βœ— | βœ— | βœ— |
| WebArena | Web navigation | βœ“ | βœ— | βœ— | βœ— |
| AgentBench | General tools | βœ— | βœ— | βœ— | βœ— |
| **ARIA** | **Incident response** | **βœ“** | **βœ“** | **βœ“** | **βœ“** |

---

## πŸš€ Setup

```bash
docker build -t aria-devops-incident .
docker run -p 7860:7860 aria-devops-incident

# Or local
pip install -r requirements.txt
uvicorn api:app --host 0.0.0.0 --port 7860
```

---

## πŸ“ Structure

```
β”œβ”€β”€ api.py / server/app.py    # FastAPI β€” all endpoints
β”œβ”€β”€ env.py                    # Environment dispatcher
β”œβ”€β”€ models.py                 # Pydantic models
β”œβ”€β”€ tasks/                    # 7 tasks + generated
β”œβ”€β”€ curriculum/engine.py      # Adaptive difficulty
β”œβ”€β”€ generator/                # Procedural incidents
β”œβ”€β”€ multi_agent/session.py    # Dual-agent mode
β”œβ”€β”€ graders/grader.py         # Deterministic grader
β”œβ”€β”€ demo_llm.py               # Live terminal demo
β”œβ”€β”€ train_grpo.ipynb          # Training notebook
β”œβ”€β”€ BLOG.md                   # Project story
└── openenv.yaml              # OpenEnv manifest
```

Apache 2.0 Β· *Built solo for the Meta Γ— PyTorch Γ— HuggingFace OpenEnv Hackathon Finals β€” Bangalore, April 2026*