Spaces:
Sleeping
Sleeping
Complete README overhaul: 7 tasks, training notebook, WebSocket, metrics documented
Browse files- README.md +66 -4
- openenv.yaml +1 -1
README.md
CHANGED
|
@@ -14,6 +14,21 @@ sdk: docker
|
|
| 14 |
# DevOps Incident Response β OpenEnv
|
| 15 |
|
| 16 |
[](https://colab.research.google.com/github/Twilight-13/devops-incident-response/blob/main/train_grpo.ipynb)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
|
| 18 |
An OpenEnv-compliant reinforcement learning environment where AI agents learn
|
| 19 |
to diagnose and remediate production software incidents across a simulated
|
|
@@ -24,11 +39,14 @@ rollbacks, restarts, and on-call escalations. The reward function gives dense
|
|
| 24 |
partial credit for information gathering, correct diagnosis, and precise
|
| 25 |
remediation, while penalising collateral damage and blind actions.
|
| 26 |
|
| 27 |
-
**
|
| 28 |
- **Easy** β single service OOM crash-loop (which service varies by seed)
|
| 29 |
- **Medium** β cascading failure from bad deployment with a red-herring alert
|
| 30 |
- **Hard** β silent data corruption with no error-rate alerts, only business metric anomalies
|
| 31 |
- **Bonus** β two simultaneous independent failures, both must be fixed
|
|
|
|
|
|
|
|
|
|
| 32 |
|
| 33 |
---
|
| 34 |
|
|
@@ -81,7 +99,7 @@ graph TD
|
|
| 81 |
|
| 82 |
### What Makes This Hard
|
| 83 |
|
| 84 |
-
The
|
| 85 |
reasoning strategies:
|
| 86 |
|
| 87 |
- **Easy**: Direct signal reading β logs clearly show OOM, fix is obvious
|
|
@@ -134,6 +152,9 @@ and exact metric values.
|
|
| 134 |
| `alert_oncall` | `reason` (str) | Page the on-call engineering team |
|
| 135 |
| `acknowledge` | `service` (alert id) | Acknowledge an active alert |
|
| 136 |
| `noop` | β | Take no action |
|
|
|
|
|
|
|
|
|
|
| 137 |
|
| 138 |
---
|
| 139 |
|
|
@@ -226,6 +247,27 @@ alert_oncall for disk cleanup AND rollback/restart ml-inference.
|
|
| 226 |
|
| 227 |
---
|
| 228 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 229 |
## Reward Function Design
|
| 230 |
|
| 231 |
```
|
|
@@ -308,12 +350,29 @@ Run with `meta-llama/Llama-3.3-70B-Instruct`, seed=42, temperature=0.1:
|
|
| 308 |
| medium | 0.6800 | β | 9 |
|
| 309 |
| hard | 0.3500 | β | 25 |
|
| 310 |
| bonus | 0.3800 | β | 25 |
|
|
|
|
|
|
|
|
|
|
| 311 |
| **average** | **0.6025** | β | β |
|
| 312 |
|
| 313 |
*Scores vary with model and temperature. Run with seed=42 for reproducibility.*
|
| 314 |
|
| 315 |
---
|
| 316 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 317 |
## API Reference
|
| 318 |
|
| 319 |
| Endpoint | Method | Body | Description |
|
|
@@ -322,8 +381,11 @@ Run with `meta-llama/Llama-3.3-70B-Instruct`, seed=42, temperature=0.1:
|
|
| 322 |
| `/reset` | POST | `{"task_id": "easy", "seed": 42}` | Start new episode |
|
| 323 |
| `/step` | POST | `Action` JSON | Take one action |
|
| 324 |
| `/state` | GET | β | Full state + ground truth + analytics |
|
| 325 |
-
| `/tasks` | GET | β | List all
|
| 326 |
| `/validate` | GET | β | Self-validation report for all tasks |
|
|
|
|
|
|
|
|
|
|
| 327 |
|
| 328 |
---
|
| 329 |
|
|
@@ -334,7 +396,7 @@ openenv validate .
|
|
| 334 |
```
|
| 335 |
|
| 336 |
All endpoints comply with the OpenEnv spec. `openenv.yaml` contains full
|
| 337 |
-
metadata including
|
| 338 |
expected score ranges, and Docker configuration.
|
| 339 |
|
| 340 |
---
|
|
|
|
| 14 |
# DevOps Incident Response β OpenEnv
|
| 15 |
|
| 16 |
[](https://colab.research.google.com/github/Twilight-13/devops-incident-response/blob/main/train_grpo.ipynb)
|
| 17 |
+
[](https://huggingface.co/spaces/Arijit-07/devops-incident-response)
|
| 18 |
+
[](LICENSE)
|
| 19 |
+
|
| 20 |
+
## Quick Start
|
| 21 |
+
|
| 22 |
+
```python
|
| 23 |
+
pip install git+https://github.com/Twilight-13/devops-incident-response.git
|
| 24 |
+
|
| 25 |
+
from devops_incident_response import DevOpsIncidentEnv, Action, ActionType
|
| 26 |
+
|
| 27 |
+
with DevOpsIncidentEnv(base_url="https://arijit-07-devops-incident-response.hf.space").sync() as env:
|
| 28 |
+
obs = env.reset(task_id="easy")
|
| 29 |
+
result = env.step(Action(action_type=ActionType.READ_LOGS, service="payment-service"))
|
| 30 |
+
print(f"Reward: {result.reward}")
|
| 31 |
+
```
|
| 32 |
|
| 33 |
An OpenEnv-compliant reinforcement learning environment where AI agents learn
|
| 34 |
to diagnose and remediate production software incidents across a simulated
|
|
|
|
| 39 |
partial credit for information gathering, correct diagnosis, and precise
|
| 40 |
remediation, while penalising collateral damage and blind actions.
|
| 41 |
|
| 42 |
+
**Seven tasks of escalating difficulty:**
|
| 43 |
- **Easy** β single service OOM crash-loop (which service varies by seed)
|
| 44 |
- **Medium** β cascading failure from bad deployment with a red-herring alert
|
| 45 |
- **Hard** β silent data corruption with no error-rate alerts, only business metric anomalies
|
| 46 |
- **Bonus** β two simultaneous independent failures, both must be fixed
|
| 47 |
+
- **Security** β botnet DDoS attack requiring IP blocking
|
| 48 |
+
- **Database** β missing schema index causing DB degradation
|
| 49 |
+
- **Failover** β partial region failure requiring precise multi-region failover
|
| 50 |
|
| 51 |
---
|
| 52 |
|
|
|
|
| 99 |
|
| 100 |
### What Makes This Hard
|
| 101 |
|
| 102 |
+
The seven tasks are designed to require qualitatively different
|
| 103 |
reasoning strategies:
|
| 104 |
|
| 105 |
- **Easy**: Direct signal reading β logs clearly show OOM, fix is obvious
|
|
|
|
| 152 |
| `alert_oncall` | `reason` (str) | Page the on-call engineering team |
|
| 153 |
| `acknowledge` | `service` (alert id) | Acknowledge an active alert |
|
| 154 |
| `noop` | β | Take no action |
|
| 155 |
+
| `block_ip_range` | `service`, `ip_range` | Block a CIDR IP range (DDoS mitigation) |
|
| 156 |
+
| `create_index` | `table`, `column` | Create a missing database index |
|
| 157 |
+
| `failover` | `service`, `target_region` | Fail over a service to another region |
|
| 158 |
|
| 159 |
---
|
| 160 |
|
|
|
|
| 247 |
|
| 248 |
---
|
| 249 |
|
| 250 |
+
### Task 5 β Security Incident Response (DDoS Attack)
|
| 251 |
+
**Max steps:** 20 | **Expected strong LLM score:** 0.40β0.60
|
| 252 |
+
|
| 253 |
+
A botnet is targeting the login endpoint with 12,000 req/s from the 185.x.x.x IP range. Standard rate limiting is ineffective (distributed attack). Agent must identify the attack pattern in access logs, diagnose the DDoS, block the IP range, and alert the security team. Neither restart nor rollback helps β wrong actions are penalized.
|
| 254 |
+
|
| 255 |
+
---
|
| 256 |
+
|
| 257 |
+
### Task 6 β Database Performance Degradation
|
| 258 |
+
**Max steps:** 20 | **Expected strong LLM score:** 0.45β0.65
|
| 259 |
+
|
| 260 |
+
A schema migration added a column without an index. All services reading that table degrade. Agent must read postgres slow query logs, identify the sequential table scan, and either create the missing index or rollback the migration. Restarting services does nothing.
|
| 261 |
+
|
| 262 |
+
---
|
| 263 |
+
|
| 264 |
+
### Task 7 β Multi-Region Failover (Partial)
|
| 265 |
+
**Max steps:** 25 | **Expected strong LLM score:** 0.35β0.55
|
| 266 |
+
|
| 267 |
+
A network partition affects us-east-1. Four services support automatic failover to us-west-2 and should be switched. Two services (payment-service, postgres-primary) must NOT be failed over β payment due to PCI compliance, postgres due to replication lag causing data loss. Incorrectly failing over the wrong services incurs a heavy -0.25 penalty.
|
| 268 |
+
|
| 269 |
+
---
|
| 270 |
+
|
| 271 |
## Reward Function Design
|
| 272 |
|
| 273 |
```
|
|
|
|
| 350 |
| medium | 0.6800 | β | 9 |
|
| 351 |
| hard | 0.3500 | β | 25 |
|
| 352 |
| bonus | 0.3800 | β | 25 |
|
| 353 |
+
| security | 0.00 | run inference.py to reproduce | 20 |
|
| 354 |
+
| database | 0.00 | run inference.py to reproduce | 20 |
|
| 355 |
+
| failover | 0.00 | run inference.py to reproduce | 25 |
|
| 356 |
| **average** | **0.6025** | β | β |
|
| 357 |
|
| 358 |
*Scores vary with model and temperature. Run with seed=42 for reproducibility.*
|
| 359 |
|
| 360 |
---
|
| 361 |
|
| 362 |
+
## RL Training Integration
|
| 363 |
+
|
| 364 |
+
This environment is designed for GRPO and other policy gradient methods.
|
| 365 |
+
See the training notebook for a full example:
|
| 366 |
+
|
| 367 |
+
```bash
|
| 368 |
+
git clone https://github.com/Twilight-13/devops-incident-response
|
| 369 |
+
jupyter notebook train_grpo.ipynb
|
| 370 |
+
```
|
| 371 |
+
|
| 372 |
+
Compatible with: TRL, SkyRL, ART, Oumi, Axolotl.
|
| 373 |
+
|
| 374 |
+
---
|
| 375 |
+
|
| 376 |
## API Reference
|
| 377 |
|
| 378 |
| Endpoint | Method | Body | Description |
|
|
|
|
| 381 |
| `/reset` | POST | `{"task_id": "easy", "seed": 42}` | Start new episode |
|
| 382 |
| `/step` | POST | `Action` JSON | Take one action |
|
| 383 |
| `/state` | GET | β | Full state + ground truth + analytics |
|
| 384 |
+
| `/tasks` | GET | β | List all 7 tasks |
|
| 385 |
| `/validate` | GET | β | Self-validation report for all tasks |
|
| 386 |
+
| `/ws` | WebSocket | - | Real-time agent-environment communication |
|
| 387 |
+
| `/metrics` | GET | - | Aggregate episode statistics |
|
| 388 |
+
| `/leaderboard` | GET | - | Top scoring episodes |
|
| 389 |
|
| 390 |
---
|
| 391 |
|
|
|
|
| 396 |
```
|
| 397 |
|
| 398 |
All endpoints comply with the OpenEnv spec. `openenv.yaml` contains full
|
| 399 |
+
metadata including 7 task definitions, action/observation space descriptions,
|
| 400 |
expected score ranges, and Docker configuration.
|
| 401 |
|
| 402 |
---
|
openenv.yaml
CHANGED
|
@@ -4,7 +4,7 @@ description: >
|
|
| 4 |
A reinforcement learning environment where AI agents learn to diagnose and
|
| 5 |
remediate production software incidents. Agents read logs, metrics, and
|
| 6 |
alerts across a simulated microservices architecture, then take remediation
|
| 7 |
-
actions such as rollbacks, restarts, and on-call escalations.
|
| 8 |
of escalating difficulty β from a clear memory leak to silent data
|
| 9 |
corruption with no error-rate alerts β provide a meaningful difficulty
|
| 10 |
progression for benchmarking agent reasoning quality.
|
|
|
|
| 4 |
A reinforcement learning environment where AI agents learn to diagnose and
|
| 5 |
remediate production software incidents. Agents read logs, metrics, and
|
| 6 |
alerts across a simulated microservices architecture, then take remediation
|
| 7 |
+
actions such as rollbacks, restarts, and on-call escalations. Seven tasks
|
| 8 |
of escalating difficulty β from a clear memory leak to silent data
|
| 9 |
corruption with no error-rate alerts β provide a meaningful difficulty
|
| 10 |
progression for benchmarking agent reasoning quality.
|