Arijit-07 commited on
Commit
5e9ab6b
Β·
1 Parent(s): c5da483

Complete README overhaul: 7 tasks, training notebook, WebSocket, metrics documented

Browse files
Files changed (2) hide show
  1. README.md +66 -4
  2. openenv.yaml +1 -1
README.md CHANGED
@@ -14,6 +14,21 @@ sdk: docker
14
  # DevOps Incident Response β€” OpenEnv
15
 
16
  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Twilight-13/devops-incident-response/blob/main/train_grpo.ipynb)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
  An OpenEnv-compliant reinforcement learning environment where AI agents learn
19
  to diagnose and remediate production software incidents across a simulated
@@ -24,11 +39,14 @@ rollbacks, restarts, and on-call escalations. The reward function gives dense
24
  partial credit for information gathering, correct diagnosis, and precise
25
  remediation, while penalising collateral damage and blind actions.
26
 
27
- **Four tasks of escalating difficulty:**
28
  - **Easy** β€” single service OOM crash-loop (which service varies by seed)
29
  - **Medium** β€” cascading failure from bad deployment with a red-herring alert
30
  - **Hard** β€” silent data corruption with no error-rate alerts, only business metric anomalies
31
  - **Bonus** β€” two simultaneous independent failures, both must be fixed
 
 
 
32
 
33
  ---
34
 
@@ -81,7 +99,7 @@ graph TD
81
 
82
  ### What Makes This Hard
83
 
84
- The four tasks are designed to require qualitatively different
85
  reasoning strategies:
86
 
87
  - **Easy**: Direct signal reading β€” logs clearly show OOM, fix is obvious
@@ -134,6 +152,9 @@ and exact metric values.
134
  | `alert_oncall` | `reason` (str) | Page the on-call engineering team |
135
  | `acknowledge` | `service` (alert id) | Acknowledge an active alert |
136
  | `noop` | β€” | Take no action |
 
 
 
137
 
138
  ---
139
 
@@ -226,6 +247,27 @@ alert_oncall for disk cleanup AND rollback/restart ml-inference.
226
 
227
  ---
228
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
229
  ## Reward Function Design
230
 
231
  ```
@@ -308,12 +350,29 @@ Run with `meta-llama/Llama-3.3-70B-Instruct`, seed=42, temperature=0.1:
308
  | medium | 0.6800 | βœ“ | 9 |
309
  | hard | 0.3500 | βœ— | 25 |
310
  | bonus | 0.3800 | βœ— | 25 |
 
 
 
311
  | **average** | **0.6025** | β€” | β€” |
312
 
313
  *Scores vary with model and temperature. Run with seed=42 for reproducibility.*
314
 
315
  ---
316
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
317
  ## API Reference
318
 
319
  | Endpoint | Method | Body | Description |
@@ -322,8 +381,11 @@ Run with `meta-llama/Llama-3.3-70B-Instruct`, seed=42, temperature=0.1:
322
  | `/reset` | POST | `{"task_id": "easy", "seed": 42}` | Start new episode |
323
  | `/step` | POST | `Action` JSON | Take one action |
324
  | `/state` | GET | β€” | Full state + ground truth + analytics |
325
- | `/tasks` | GET | β€” | List all 4 tasks |
326
  | `/validate` | GET | β€” | Self-validation report for all tasks |
 
 
 
327
 
328
  ---
329
 
@@ -334,7 +396,7 @@ openenv validate .
334
  ```
335
 
336
  All endpoints comply with the OpenEnv spec. `openenv.yaml` contains full
337
- metadata including 4 task definitions, action/observation space descriptions,
338
  expected score ranges, and Docker configuration.
339
 
340
  ---
 
14
  # DevOps Incident Response β€” OpenEnv
15
 
16
  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Twilight-13/devops-incident-response/blob/main/train_grpo.ipynb)
17
+ [![HF Space](https://img.shields.io/badge/HuggingFace-Space-orange)](https://huggingface.co/spaces/Arijit-07/devops-incident-response)
18
+ [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](LICENSE)
19
+
20
+ ## Quick Start
21
+
22
+ ```python
23
+ pip install git+https://github.com/Twilight-13/devops-incident-response.git
24
+
25
+ from devops_incident_response import DevOpsIncidentEnv, Action, ActionType
26
+
27
+ with DevOpsIncidentEnv(base_url="https://arijit-07-devops-incident-response.hf.space").sync() as env:
28
+ obs = env.reset(task_id="easy")
29
+ result = env.step(Action(action_type=ActionType.READ_LOGS, service="payment-service"))
30
+ print(f"Reward: {result.reward}")
31
+ ```
32
 
33
  An OpenEnv-compliant reinforcement learning environment where AI agents learn
34
  to diagnose and remediate production software incidents across a simulated
 
39
  partial credit for information gathering, correct diagnosis, and precise
40
  remediation, while penalising collateral damage and blind actions.
41
 
42
+ **Seven tasks of escalating difficulty:**
43
  - **Easy** β€” single service OOM crash-loop (which service varies by seed)
44
  - **Medium** β€” cascading failure from bad deployment with a red-herring alert
45
  - **Hard** β€” silent data corruption with no error-rate alerts, only business metric anomalies
46
  - **Bonus** β€” two simultaneous independent failures, both must be fixed
47
+ - **Security** β€” botnet DDoS attack requiring IP blocking
48
+ - **Database** β€” missing schema index causing DB degradation
49
+ - **Failover** β€” partial region failure requiring precise multi-region failover
50
 
51
  ---
52
 
 
99
 
100
  ### What Makes This Hard
101
 
102
+ The seven tasks are designed to require qualitatively different
103
  reasoning strategies:
104
 
105
  - **Easy**: Direct signal reading β€” logs clearly show OOM, fix is obvious
 
152
  | `alert_oncall` | `reason` (str) | Page the on-call engineering team |
153
  | `acknowledge` | `service` (alert id) | Acknowledge an active alert |
154
  | `noop` | β€” | Take no action |
155
+ | `block_ip_range` | `service`, `ip_range` | Block a CIDR IP range (DDoS mitigation) |
156
+ | `create_index` | `table`, `column` | Create a missing database index |
157
+ | `failover` | `service`, `target_region` | Fail over a service to another region |
158
 
159
  ---
160
 
 
247
 
248
  ---
249
 
250
+ ### Task 5 β€” Security Incident Response (DDoS Attack)
251
+ **Max steps:** 20 | **Expected strong LLM score:** 0.40–0.60
252
+
253
+ A botnet is targeting the login endpoint with 12,000 req/s from the 185.x.x.x IP range. Standard rate limiting is ineffective (distributed attack). Agent must identify the attack pattern in access logs, diagnose the DDoS, block the IP range, and alert the security team. Neither restart nor rollback helps β€” wrong actions are penalized.
254
+
255
+ ---
256
+
257
+ ### Task 6 β€” Database Performance Degradation
258
+ **Max steps:** 20 | **Expected strong LLM score:** 0.45–0.65
259
+
260
+ A schema migration added a column without an index. All services reading that table degrade. Agent must read postgres slow query logs, identify the sequential table scan, and either create the missing index or rollback the migration. Restarting services does nothing.
261
+
262
+ ---
263
+
264
+ ### Task 7 β€” Multi-Region Failover (Partial)
265
+ **Max steps:** 25 | **Expected strong LLM score:** 0.35–0.55
266
+
267
+ A network partition affects us-east-1. Four services support automatic failover to us-west-2 and should be switched. Two services (payment-service, postgres-primary) must NOT be failed over β€” payment due to PCI compliance, postgres due to replication lag causing data loss. Incorrectly failing over the wrong services incurs a heavy -0.25 penalty.
268
+
269
+ ---
270
+
271
  ## Reward Function Design
272
 
273
  ```
 
350
  | medium | 0.6800 | βœ“ | 9 |
351
  | hard | 0.3500 | βœ— | 25 |
352
  | bonus | 0.3800 | βœ— | 25 |
353
+ | security | 0.00 | run inference.py to reproduce | 20 |
354
+ | database | 0.00 | run inference.py to reproduce | 20 |
355
+ | failover | 0.00 | run inference.py to reproduce | 25 |
356
  | **average** | **0.6025** | β€” | β€” |
357
 
358
  *Scores vary with model and temperature. Run with seed=42 for reproducibility.*
359
 
360
  ---
361
 
362
+ ## RL Training Integration
363
+
364
+ This environment is designed for GRPO and other policy gradient methods.
365
+ See the training notebook for a full example:
366
+
367
+ ```bash
368
+ git clone https://github.com/Twilight-13/devops-incident-response
369
+ jupyter notebook train_grpo.ipynb
370
+ ```
371
+
372
+ Compatible with: TRL, SkyRL, ART, Oumi, Axolotl.
373
+
374
+ ---
375
+
376
  ## API Reference
377
 
378
  | Endpoint | Method | Body | Description |
 
381
  | `/reset` | POST | `{"task_id": "easy", "seed": 42}` | Start new episode |
382
  | `/step` | POST | `Action` JSON | Take one action |
383
  | `/state` | GET | β€” | Full state + ground truth + analytics |
384
+ | `/tasks` | GET | β€” | List all 7 tasks |
385
  | `/validate` | GET | β€” | Self-validation report for all tasks |
386
+ | `/ws` | WebSocket | - | Real-time agent-environment communication |
387
+ | `/metrics` | GET | - | Aggregate episode statistics |
388
+ | `/leaderboard` | GET | - | Top scoring episodes |
389
 
390
  ---
391
 
 
396
  ```
397
 
398
  All endpoints comply with the OpenEnv spec. `openenv.yaml` contains full
399
+ metadata including 7 task definitions, action/observation space descriptions,
400
  expected score ranges, and Docker configuration.
401
 
402
  ---
openenv.yaml CHANGED
@@ -4,7 +4,7 @@ description: >
4
  A reinforcement learning environment where AI agents learn to diagnose and
5
  remediate production software incidents. Agents read logs, metrics, and
6
  alerts across a simulated microservices architecture, then take remediation
7
- actions such as rollbacks, restarts, and on-call escalations. Four tasks
8
  of escalating difficulty β€” from a clear memory leak to silent data
9
  corruption with no error-rate alerts β€” provide a meaningful difficulty
10
  progression for benchmarking agent reasoning quality.
 
4
  A reinforcement learning environment where AI agents learn to diagnose and
5
  remediate production software incidents. Agents read logs, metrics, and
6
  alerts across a simulated microservices architecture, then take remediation
7
+ actions such as rollbacks, restarts, and on-call escalations. Seven tasks
8
  of escalating difficulty β€” from a clear memory leak to silent data
9
  corruption with no error-rate alerts β€” provide a meaningful difficulty
10
  progression for benchmarking agent reasoning quality.