--- title: ARIA DevOps Incident Response emoji: ๐Ÿšจ colorFrom: blue colorTo: red sdk: docker pinned: true license: apache-2.0 tags: - openenv - reinforcement-learning - devops - incident-response - rl-environment - multi-agent - llm-agent - grpo - curriculum-learning - huggingface - pytorch - meta short_description: "OpenEnv RL for incident response. 7 tasks, Llama-3.1-8B" --- # ARIA โ€” DevOps Incident Response ### *The first OpenEnv RL environment for production incident response* [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Twilight-13/devops-incident-response/blob/main/train_grpo.ipynb) [![HF Space](https://img.shields.io/badge/๐Ÿค—-Live%20Environment-orange)](https://huggingface.co/spaces/Arijit-07/devops-incident-response) [![Trained Model](https://img.shields.io/badge/๐Ÿค—-Llama--3.1--8B%20Fine--tuned-blue)](https://huggingface.co/Arijit-07/aria-devops-llama8b) [![License](https://img.shields.io/badge/License-Apache_2.0-green.svg)](LICENSE) > **ARIA** โ€” Adaptive Reward & Incident Architecture > Built for the Meta ร— PyTorch ร— HuggingFace OpenEnv Hackathon Finals | Bangalore, April 2026 --- ## ๐Ÿ”— Quick Links for Judges | Resource | Link | |---|---| | **Live Environment** | https://arijit-07-devops-incident-response.hf.space | | **Interactive API** | https://arijit-07-devops-incident-response.hf.space/docs | | **Trained Model (8B)** | https://huggingface.co/Arijit-07/aria-devops-llama8b | | **Training Curve** | https://huggingface.co/Arijit-07/aria-devops-llama8b/resolve/main/training_curve_8b.png | | **Blog Post** | https://huggingface.co/blog/Arijit-07/aria-devops-incident-response | | **GitHub** | https://github.com/Twilight-13/devops-incident-response | | **Validate** | https://arijit-07-devops-incident-response.hf.space/validate | | **About (machine-readable)** | https://arijit-07-devops-incident-response.hf.space/about | --- ## โšก Run a Complete Episode Right Now ```bash # 1. Start an easy incident curl -X POST https://arijit-07-devops-incident-response.hf.space/reset \ -H "Content-Type: application/json" \ -d '{"task_id": "easy", "seed": 42}' # 2. Read logs on the failing service (reward: +0.15) curl -X POST https://arijit-07-devops-incident-response.hf.space/step \ -H "Content-Type: application/json" \ -d '{"action_type": "read_logs", "service": "payment-service"}' # 3. Diagnose (reward: +0.30) curl -X POST https://arijit-07-devops-incident-response.hf.space/step \ -H "Content-Type: application/json" \ -d '{"action_type": "diagnose", "root_cause": "memory leak in payment-service"}' # 4. Fix it (reward: +0.40) curl -X POST https://arijit-07-devops-incident-response.hf.space/step \ -H "Content-Type: application/json" \ -d '{"action_type": "restart_service", "service": "payment-service"}' # 5. Validate all 7 tasks pass curl https://arijit-07-devops-incident-response.hf.space/validate ``` --- ## ๐ŸŽฏ The Problem Every company running microservices faces the same reality: **production incidents are expensive, stressful, and happen at 3am.** SWE-bench tests code generation. WebArena tests web navigation. Nothing trains agents to handle live production incidents โ€” to read logs strategically, trace cascading failures, correlate subtle business anomalies, and apply precise fixes where wrong choices cause collateral damage. **ARIA fills that gap.** --- ## ๐ŸŽฌ The 7 Tasks | Task | Max Steps | Random | Strong LLM | Scenario | |---|---|---|---|---| | `easy` | 15 | 0.05 | 0.85โ€“1.00 | Single service OOM crash-loop | | `medium` | 20 | 0.03 | 0.55โ€“0.75 | Cascading failure + red herring alert | | `hard` | 25 | 0.01 | 0.30โ€“0.50 | **Silent** corruption โ€” all services green | | `bonus` | 25 | 0.01 | 0.35โ€“0.55 | Two simultaneous independent failures | | `security` | 20 | 0.01 | 0.40โ€“0.60 | DDoS botnet credential stuffing | | `database` | 20 | 0.01 | 0.45โ€“0.65 | Missing index โ€” full table scans | | `failover` | 25 | 0.01 | 0.35โ€“0.55 | Multi-region network partition | | `generated` | 20 | 0.01 | variable | Procedural โ€” seed-deterministic | --- ## ๐Ÿ† Reward Function ``` Final Score = ฮฃ(step_rewards) + efficiency_bonus # (1 - steps/max_steps) ร— 0.05 + diagnosis_precision # +0.03 if โ‰ฅ50% keyword overlap - noop_penalty # (noops - 3) ร— 0.02 ``` Clamped to **(0.001, 0.999)** for GRPO stability. | Action | Reward | Penalty Triggers | |---|---|---| | `read_logs` correct | +0.15 | Restart healthy service: **-0.15** | | `diagnose` full match | +0.35 | Fix without diagnosing: **-0.10** | | `restart_service` correct | +0.45 | Wrong failover (payment): **-0.25** | | `block_ip_range` | +0.40 | Excessive noops: **-0.04 each** | | `alert_oncall` (required) | +0.15 | | **Semantic matching:** keyword overlap not exact string โ€” LLMs that paraphrase aren't penalized. --- ## ๐ŸŒŸ ARIA Features ### Curriculum Engine Rolling average per task (last 5 episodes). Promotes when avg > 0.75. Scaffolds with hints when avg < 0.30. Agents always train at the edge of their capability. ```bash GET /curriculum/status GET /curriculum/next POST /curriculum/record # {"task_id": "easy", "score": 0.85} ``` ### Incident Generator Seeds 0โ€“99,999 โ†’ unique reproducible incidents. 6 failure modes ร— 8 services ร— 3 severities ร— 0โ€“3 noise alerts. ```bash GET /generate/preview?seed=1337 POST /reset # {"task_id": "generated", "seed": 1337} ``` ### Dual-Agent Mode Split observability. Agent A (Observer) sees logs and alerts. Agent B (Responder) sees metrics and dependencies. They coordinate via `share_finding`. Neither can solve the incident alone. ```bash POST /multi-agent/reset # {"task_id": "easy", "seed": 42} POST /multi-agent/step/a/{id} # {"finding": "order-service OOM"} POST /multi-agent/step/b/{id} # {"action_type": "restart_service", ...} ``` --- ## ๐Ÿง  Training Results **Model:** [Arijit-07/aria-devops-llama8b](https://huggingface.co/Arijit-07/aria-devops-llama8b) | Task | Baseline | Fine-tuned | **Improvement** | |---|---|---|---| | easy | 0.320 | 0.685 | **+0.365** | | medium | 0.050 | 0.378 | **+0.328** | | hard | 0.190 | 0.869 | **+0.679** | | bonus | 0.152 | 0.682 | **+0.530** | ![Training Curve](https://huggingface.co/Arijit-07/aria-devops-llama8b/resolve/main/training_curve_8b.png) **Setup:** GRPO ยท Llama-3.1-8B ยท LoRA rank=32 ยท 160 episodes ยท NVIDIA L4 ยท 162 minutes ยท Unsloth + HuggingFace TRL **Key fix:** Group completions scored on fresh environment snapshots โ€” prevents reward gate exhaustion during GRPO group generation. [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Twilight-13/devops-incident-response/blob/main/train_grpo.ipynb) --- ## ๐Ÿ“ก API Reference | Method | Endpoint | Description | |---|---|---| | GET | `/health` | Liveness check | | GET | `/about` | Full machine-readable description | | GET | `/tasks` | All 8 tasks | | POST | `/reset` | Start episode | | POST | `/step` | Take action | | GET | `/state` | Full state + ground truth | | GET | `/validate` | Self-test all 7 tasks | | GET | `/metrics` | Aggregate statistics | | GET | `/leaderboard` | Top 10 episodes | | WS | `/ws` | WebSocket real-time | | GET | `/curriculum/status` | Per-task mastery | | GET | `/curriculum/next` | Recommended task | | POST | `/curriculum/record` | Feed training results | | GET | `/generate/preview` | Preview procedural incident | | POST | `/multi-agent/reset` | Start dual-agent session | | POST | `/multi-agent/step/a/{id}` | Agent A shares finding | | POST | `/multi-agent/step/b/{id}` | Agent B takes action | | GET | `/live` | Live NOC dashboard (real-time) | | GET | `/challenge` | Human vs Agent challenge | | GET | `/progress` | Score progression visualization | | GET | `/replays` | Episode replay list | | GET | `/replay/{id}` | Full episode replay | | GET | `/replay/{id}/html` | Replay HTML viewer | | GET | `/docs` | Swagger UI | --- ## ๐Ÿ“Š Benchmark Comparison | Benchmark | Domain | Partial Obs | Dense Reward | Curriculum | Multi-Agent | |---|---|---|---|---|---| | SWE-bench | Code repair | โœ— | โœ— | โœ— | โœ— | | WebArena | Web navigation | โœ“ | โœ— | โœ— | โœ— | | AgentBench | General tools | โœ— | โœ— | โœ— | โœ— | | **ARIA** | **Incident response** | **โœ“** | **โœ“** | **โœ“** | **โœ“** | --- ## ๐Ÿš€ Setup ```bash docker build -t aria-devops-incident . docker run -p 7860:7860 aria-devops-incident # Or local pip install -r requirements.txt uvicorn api:app --host 0.0.0.0 --port 7860 ``` --- ## ๐Ÿ“ Structure ``` โ”œโ”€โ”€ api.py / server/app.py # FastAPI โ€” all endpoints โ”œโ”€โ”€ env.py # Environment dispatcher โ”œโ”€โ”€ models.py # Pydantic models โ”œโ”€โ”€ tasks/ # 7 tasks + generated โ”œโ”€โ”€ curriculum/engine.py # Adaptive difficulty โ”œโ”€โ”€ generator/ # Procedural incidents โ”œโ”€โ”€ multi_agent/session.py # Dual-agent mode โ”œโ”€โ”€ graders/grader.py # Deterministic grader โ”œโ”€โ”€ demo_llm.py # Live terminal demo โ”œโ”€โ”€ train_grpo.ipynb # Training notebook โ”œโ”€โ”€ BLOG.md # Project story โ””โ”€โ”€ openenv.yaml # OpenEnv manifest ``` Apache 2.0 ยท *Built solo for the Meta ร— PyTorch ร— HuggingFace OpenEnv Hackathon Finals โ€” Bangalore, April 2026*