Arijit-07's picture
final: submission cleanup β€” remove junk files, update README endpoints, clean .gitignore
230f8d5
---
title: ARIA DevOps Incident Response
emoji: 🚨
colorFrom: blue
colorTo: red
sdk: docker
pinned: true
license: apache-2.0
tags:
- openenv
- reinforcement-learning
- devops
- incident-response
- rl-environment
- multi-agent
- llm-agent
- grpo
- curriculum-learning
- huggingface
- pytorch
- meta
short_description: "OpenEnv RL for incident response. 7 tasks, Llama-3.1-8B"
---
# ARIA β€” DevOps Incident Response
### *The first OpenEnv RL environment for production incident response*
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Twilight-13/devops-incident-response/blob/main/train_grpo.ipynb)
[![HF Space](https://img.shields.io/badge/πŸ€—-Live%20Environment-orange)](https://huggingface.co/spaces/Arijit-07/devops-incident-response)
[![Trained Model](https://img.shields.io/badge/πŸ€—-Llama--3.1--8B%20Fine--tuned-blue)](https://huggingface.co/Arijit-07/aria-devops-llama8b)
[![License](https://img.shields.io/badge/License-Apache_2.0-green.svg)](LICENSE)
> **ARIA** β€” Adaptive Reward & Incident Architecture
> Built for the Meta Γ— PyTorch Γ— HuggingFace OpenEnv Hackathon Finals | Bangalore, April 2026
---
## πŸ”— Quick Links for Judges
| Resource | Link |
|---|---|
| **Live Environment** | https://arijit-07-devops-incident-response.hf.space |
| **Interactive API** | https://arijit-07-devops-incident-response.hf.space/docs |
| **Trained Model (8B)** | https://huggingface.co/Arijit-07/aria-devops-llama8b |
| **Training Curve** | https://huggingface.co/Arijit-07/aria-devops-llama8b/resolve/main/training_curve_8b.png |
| **Blog Post** | https://huggingface.co/blog/Arijit-07/aria-devops-incident-response |
| **GitHub** | https://github.com/Twilight-13/devops-incident-response |
| **Validate** | https://arijit-07-devops-incident-response.hf.space/validate |
| **About (machine-readable)** | https://arijit-07-devops-incident-response.hf.space/about |
---
## ⚑ Run a Complete Episode Right Now
```bash
# 1. Start an easy incident
curl -X POST https://arijit-07-devops-incident-response.hf.space/reset \
-H "Content-Type: application/json" \
-d '{"task_id": "easy", "seed": 42}'
# 2. Read logs on the failing service (reward: +0.15)
curl -X POST https://arijit-07-devops-incident-response.hf.space/step \
-H "Content-Type: application/json" \
-d '{"action_type": "read_logs", "service": "payment-service"}'
# 3. Diagnose (reward: +0.30)
curl -X POST https://arijit-07-devops-incident-response.hf.space/step \
-H "Content-Type: application/json" \
-d '{"action_type": "diagnose", "root_cause": "memory leak in payment-service"}'
# 4. Fix it (reward: +0.40)
curl -X POST https://arijit-07-devops-incident-response.hf.space/step \
-H "Content-Type: application/json" \
-d '{"action_type": "restart_service", "service": "payment-service"}'
# 5. Validate all 7 tasks pass
curl https://arijit-07-devops-incident-response.hf.space/validate
```
---
## 🎯 The Problem
Every company running microservices faces the same reality: **production incidents are expensive, stressful, and happen at 3am.**
SWE-bench tests code generation. WebArena tests web navigation. Nothing trains agents to handle live production incidents β€” to read logs strategically, trace cascading failures, correlate subtle business anomalies, and apply precise fixes where wrong choices cause collateral damage.
**ARIA fills that gap.**
---
## 🎬 The 7 Tasks
| Task | Max Steps | Random | Strong LLM | Scenario |
|---|---|---|---|---|
| `easy` | 15 | 0.05 | 0.85–1.00 | Single service OOM crash-loop |
| `medium` | 20 | 0.03 | 0.55–0.75 | Cascading failure + red herring alert |
| `hard` | 25 | 0.01 | 0.30–0.50 | **Silent** corruption β€” all services green |
| `bonus` | 25 | 0.01 | 0.35–0.55 | Two simultaneous independent failures |
| `security` | 20 | 0.01 | 0.40–0.60 | DDoS botnet credential stuffing |
| `database` | 20 | 0.01 | 0.45–0.65 | Missing index β€” full table scans |
| `failover` | 25 | 0.01 | 0.35–0.55 | Multi-region network partition |
| `generated` | 20 | 0.01 | variable | Procedural β€” seed-deterministic |
---
## πŸ† Reward Function
```
Final Score = Ξ£(step_rewards)
+ efficiency_bonus # (1 - steps/max_steps) Γ— 0.05
+ diagnosis_precision # +0.03 if β‰₯50% keyword overlap
- noop_penalty # (noops - 3) Γ— 0.02
```
Clamped to **(0.001, 0.999)** for GRPO stability.
| Action | Reward | Penalty Triggers |
|---|---|---|
| `read_logs` correct | +0.15 | Restart healthy service: **-0.15** |
| `diagnose` full match | +0.35 | Fix without diagnosing: **-0.10** |
| `restart_service` correct | +0.45 | Wrong failover (payment): **-0.25** |
| `block_ip_range` | +0.40 | Excessive noops: **-0.04 each** |
| `alert_oncall` (required) | +0.15 | |
**Semantic matching:** keyword overlap not exact string β€” LLMs that paraphrase aren't penalized.
---
## 🌟 ARIA Features
### Curriculum Engine
Rolling average per task (last 5 episodes). Promotes when avg > 0.75. Scaffolds with hints when avg < 0.30. Agents always train at the edge of their capability.
```bash
GET /curriculum/status
GET /curriculum/next
POST /curriculum/record # {"task_id": "easy", "score": 0.85}
```
### Incident Generator
Seeds 0–99,999 β†’ unique reproducible incidents. 6 failure modes Γ— 8 services Γ— 3 severities Γ— 0–3 noise alerts.
```bash
GET /generate/preview?seed=1337
POST /reset # {"task_id": "generated", "seed": 1337}
```
### Dual-Agent Mode
Split observability. Agent A (Observer) sees logs and alerts. Agent B (Responder) sees metrics and dependencies. They coordinate via `share_finding`. Neither can solve the incident alone.
```bash
POST /multi-agent/reset # {"task_id": "easy", "seed": 42}
POST /multi-agent/step/a/{id} # {"finding": "order-service OOM"}
POST /multi-agent/step/b/{id} # {"action_type": "restart_service", ...}
```
---
## 🧠 Training Results
**Model:** [Arijit-07/aria-devops-llama8b](https://huggingface.co/Arijit-07/aria-devops-llama8b)
| Task | Baseline | Fine-tuned | **Improvement** |
|---|---|---|---|
| easy | 0.320 | 0.685 | **+0.365** |
| medium | 0.050 | 0.378 | **+0.328** |
| hard | 0.190 | 0.869 | **+0.679** |
| bonus | 0.152 | 0.682 | **+0.530** |
![Training Curve](https://huggingface.co/Arijit-07/aria-devops-llama8b/resolve/main/training_curve_8b.png)
**Setup:** GRPO Β· Llama-3.1-8B Β· LoRA rank=32 Β· 160 episodes Β· NVIDIA L4 Β· 162 minutes Β· Unsloth + HuggingFace TRL
**Key fix:** Group completions scored on fresh environment snapshots β€” prevents reward gate exhaustion during GRPO group generation.
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Twilight-13/devops-incident-response/blob/main/train_grpo.ipynb)
---
## πŸ“‘ API Reference
| Method | Endpoint | Description |
|---|---|---|
| GET | `/health` | Liveness check |
| GET | `/about` | Full machine-readable description |
| GET | `/tasks` | All 8 tasks |
| POST | `/reset` | Start episode |
| POST | `/step` | Take action |
| GET | `/state` | Full state + ground truth |
| GET | `/validate` | Self-test all 7 tasks |
| GET | `/metrics` | Aggregate statistics |
| GET | `/leaderboard` | Top 10 episodes |
| WS | `/ws` | WebSocket real-time |
| GET | `/curriculum/status` | Per-task mastery |
| GET | `/curriculum/next` | Recommended task |
| POST | `/curriculum/record` | Feed training results |
| GET | `/generate/preview` | Preview procedural incident |
| POST | `/multi-agent/reset` | Start dual-agent session |
| POST | `/multi-agent/step/a/{id}` | Agent A shares finding |
| POST | `/multi-agent/step/b/{id}` | Agent B takes action |
| GET | `/live` | Live NOC dashboard (real-time) |
| GET | `/challenge` | Human vs Agent challenge |
| GET | `/progress` | Score progression visualization |
| GET | `/replays` | Episode replay list |
| GET | `/replay/{id}` | Full episode replay |
| GET | `/replay/{id}/html` | Replay HTML viewer |
| GET | `/docs` | Swagger UI |
---
## πŸ“Š Benchmark Comparison
| Benchmark | Domain | Partial Obs | Dense Reward | Curriculum | Multi-Agent |
|---|---|---|---|---|---|
| SWE-bench | Code repair | βœ— | βœ— | βœ— | βœ— |
| WebArena | Web navigation | βœ“ | βœ— | βœ— | βœ— |
| AgentBench | General tools | βœ— | βœ— | βœ— | βœ— |
| **ARIA** | **Incident response** | **βœ“** | **βœ“** | **βœ“** | **βœ“** |
---
## πŸš€ Setup
```bash
docker build -t aria-devops-incident .
docker run -p 7860:7860 aria-devops-incident
# Or local
pip install -r requirements.txt
uvicorn api:app --host 0.0.0.0 --port 7860
```
---
## πŸ“ Structure
```
β”œβ”€β”€ api.py / server/app.py # FastAPI β€” all endpoints
β”œβ”€β”€ env.py # Environment dispatcher
β”œβ”€β”€ models.py # Pydantic models
β”œβ”€β”€ tasks/ # 7 tasks + generated
β”œβ”€β”€ curriculum/engine.py # Adaptive difficulty
β”œβ”€β”€ generator/ # Procedural incidents
β”œβ”€β”€ multi_agent/session.py # Dual-agent mode
β”œβ”€β”€ graders/grader.py # Deterministic grader
β”œβ”€β”€ demo_llm.py # Live terminal demo
β”œβ”€β”€ train_grpo.ipynb # Training notebook
β”œβ”€β”€ BLOG.md # Project story
└── openenv.yaml # OpenEnv manifest
```
Apache 2.0 Β· *Built solo for the Meta Γ— PyTorch Γ— HuggingFace OpenEnv Hackathon Finals β€” Bangalore, April 2026*