final: submission cleanup — remove junk files, update README endpoints, clean .gitignore
Browse files- Remove FINALS_STATUS.md, README_github.md, technical_reference.md
- Remove escape.py, inject_live.py (dev scripts)
- Remove uvicorn_err.txt, uvicorn_out.txt
- Add /live, /challenge, /progress, /replays, /replay/{id} to API table in README.md
- Rewrite .gitignore with clean UTF-8 encoding + new exclusion patterns
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- .gitignore +0 -0
- FINALS_STATUS.md +0 -29
- README.md +6 -0
- README_github.md +0 -223
- escape.py +0 -35
- inject_live.py +0 -618
- technical_reference.md +0 -106
- uvicorn_err.txt +0 -0
- uvicorn_out.txt +0 -0
.gitignore
CHANGED
|
Binary files a/.gitignore and b/.gitignore differ
|
|
|
FINALS_STATUS.md
DELETED
|
@@ -1,29 +0,0 @@
|
|
| 1 |
-
# DevOps Incident Response OpenEnv — Hackathon Finals Status Report
|
| 2 |
-
|
| 3 |
-
## System Readiness: 🟢 READY FOR FINALS (STABLE)
|
| 4 |
-
|
| 5 |
-
This document serves as the final system state report following a comprehensive 10-point stress test and validation suite conducted on the environment ahead of the Meta hackathon finals.
|
| 6 |
-
|
| 7 |
-
### Validation Summary
|
| 8 |
-
|
| 9 |
-
| Test Suite | Objective | Status | Notes |
|
| 10 |
-
| :--- | :--- | :--- | :--- |
|
| 11 |
-
| **TEST 1: Optimal End-to-End Validation** | Verify all 7 tasks resolve successfully via their optimal deterministic agent path | ✅ **PASSED** | Fixed task scoring on the `bonus` tier. All tasks now natively yield final scores well above the `0.70` threshold. |
|
| 12 |
-
| **TEST 2: New Actions Efficacy** | Validate `BLOCK_IP_RANGE`, `CREATE_INDEX`, and `FAILOVER` mechanisms | ✅ **PASSED** | Actions reward positively on intended tasks. Added cross-task safety: attempting advanced domain actions randomly triggers strict `0.10` collateral penalties to prevent generalized hallucination exploits. |
|
| 13 |
-
| **TEST 3: WebSocket Protocol** | Verify server compatibility with async client connections | ✅ **PASSED** | Verified connection payload streams using FastAPI WebSocket routing inside `server/app.py`. |
|
| 14 |
-
| **TEST 4: Metrics / Leaderboard API** | Verify on-memory rolling cache metrics | ✅ **PASSED** | Effectively computes and routes full aggregated endpoints across all 7 tasks via `deque`. |
|
| 15 |
-
| **TEST 5: Graceful Error Enforcement** | Validate invalid inputs return HTTP 400s | ✅ **PASSED** | Invalid JSON payloads or unknown action enums gracefully yield `422 Unprocessable Entity` rather than locking the `server` layer. |
|
| 16 |
-
| **TEST 6: Runbook Validation** | Ensure all incidents match accompanying markdown | ✅ **PASSED** | Tested integration linking new actions to their respective diagnostic documentation correctly. |
|
| 17 |
-
| **TEST 7: Cross-Seed Stability** | Execute 20x episodes using an unconstrained random agent | ✅ **PASSED** | Refactored randomization parameters to prevent static stalling. Tested gracefully across seeds with outputs safely scaling inside the strictly required `(0.0, 1.0)` domains. |
|
| 18 |
-
| **TEST 8: Live HF Space Ping** | Ensure active remote deployment stability | ✅ **PASSED** | Space is alive. Verified HTTP 200 checks and validated successful endpoint load-ins natively over the web. |
|
| 19 |
-
| **TEST 9: Docker Build Sandbox** | Deploy via closed isolated container layer | ➖ **SKIPPED** | Docker Daemon initialization unavailable on the host; tests executed fully at the Python module layer safely simulating equivalence. |
|
| 20 |
-
|
| 21 |
-
### Technical Bug Fixes Applied
|
| 22 |
-
|
| 23 |
-
During testing, several backend elements were refactored for production-grade robustness:
|
| 24 |
-
1. **Collateral Penalty Injection:** Injected tight action validation scopes into every core task (`task_*.py`), preventing `FAILOVER` instructions from triggering on standard internal tasks and correctly returning `-0.10` penalty weights.
|
| 25 |
-
2. **Random Path Traversing:** Stopped random agents from artificially focusing on single services (`payment-service`), which skewed `grade_episode(..., 0.001)` limits and generated deterministic failure clusters during multi-seed attempts.
|
| 26 |
-
|
| 27 |
-
The environment is strictly bound, accurately evaluates all 7 domains, properly exposes modern endpoints (`/ws`, `/metrics`, `/leaderboard`), and correctly penalizes stray agent anomalies.
|
| 28 |
-
|
| 29 |
-
**Good luck at the finals!**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
README.md
CHANGED
|
@@ -199,6 +199,12 @@ POST /multi-agent/step/b/{id} # {"action_type": "restart_service", ...}
|
|
| 199 |
| POST | `/multi-agent/reset` | Start dual-agent session |
|
| 200 |
| POST | `/multi-agent/step/a/{id}` | Agent A shares finding |
|
| 201 |
| POST | `/multi-agent/step/b/{id}` | Agent B takes action |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 202 |
| GET | `/docs` | Swagger UI |
|
| 203 |
|
| 204 |
---
|
|
|
|
| 199 |
| POST | `/multi-agent/reset` | Start dual-agent session |
|
| 200 |
| POST | `/multi-agent/step/a/{id}` | Agent A shares finding |
|
| 201 |
| POST | `/multi-agent/step/b/{id}` | Agent B takes action |
|
| 202 |
+
| GET | `/live` | Live NOC dashboard (real-time) |
|
| 203 |
+
| GET | `/challenge` | Human vs Agent challenge |
|
| 204 |
+
| GET | `/progress` | Score progression visualization |
|
| 205 |
+
| GET | `/replays` | Episode replay list |
|
| 206 |
+
| GET | `/replay/{id}` | Full episode replay |
|
| 207 |
+
| GET | `/replay/{id}/html` | Replay HTML viewer |
|
| 208 |
| GET | `/docs` | Swagger UI |
|
| 209 |
|
| 210 |
---
|
README_github.md
DELETED
|
@@ -1,223 +0,0 @@
|
|
| 1 |
-
# ARIA — DevOps Incident Response
|
| 2 |
-
### *The first OpenEnv RL environment for production incident response*
|
| 3 |
-
|
| 4 |
-
[](https://colab.research.google.com/github/Twilight-13/devops-incident-response/blob/main/train_grpo.ipynb)
|
| 5 |
-
[](https://huggingface.co/spaces/Arijit-07/devops-incident-response)
|
| 6 |
-
[](https://huggingface.co/Arijit-07/aria-devops-llama8b)
|
| 7 |
-
[](LICENSE)
|
| 8 |
-
|
| 9 |
-
> **ARIA** — Adaptive Reward & Incident Architecture
|
| 10 |
-
> Built for the Meta × PyTorch × HuggingFace OpenEnv Hackathon Finals | Bangalore, April 2026
|
| 11 |
-
|
| 12 |
-
---
|
| 13 |
-
|
| 14 |
-
## 🔗 Quick Links for Judges
|
| 15 |
-
|
| 16 |
-
| Resource | Link |
|
| 17 |
-
|---|---|
|
| 18 |
-
| **Live Environment** | https://arijit-07-devops-incident-response.hf.space |
|
| 19 |
-
| **Interactive API** | https://arijit-07-devops-incident-response.hf.space/docs |
|
| 20 |
-
| **Trained Model (8B)** | https://huggingface.co/Arijit-07/aria-devops-llama8b |
|
| 21 |
-
| **Training Curve** | https://huggingface.co/Arijit-07/aria-devops-llama8b/resolve/main/training_curve_8b.png |
|
| 22 |
-
| **Blog Post** | https://huggingface.co/blog/Arijit-07/aria-devops-incident-response |
|
| 23 |
-
| **GitHub** | https://github.com/Twilight-13/devops-incident-response |
|
| 24 |
-
| **Validate** | https://arijit-07-devops-incident-response.hf.space/validate |
|
| 25 |
-
| **About (machine-readable)** | https://arijit-07-devops-incident-response.hf.space/about |
|
| 26 |
-
|
| 27 |
-
---
|
| 28 |
-
|
| 29 |
-
## ⚡ Run a Complete Episode Right Now
|
| 30 |
-
|
| 31 |
-
```bash
|
| 32 |
-
# 1. Start an easy incident
|
| 33 |
-
curl -X POST https://arijit-07-devops-incident-response.hf.space/reset \
|
| 34 |
-
-H "Content-Type: application/json" \
|
| 35 |
-
-d '{"task_id": "easy", "seed": 42}'
|
| 36 |
-
|
| 37 |
-
# 2. Read logs on the failing service (reward: +0.15)
|
| 38 |
-
curl -X POST https://arijit-07-devops-incident-response.hf.space/step \
|
| 39 |
-
-H "Content-Type: application/json" \
|
| 40 |
-
-d '{"action_type": "read_logs", "service": "payment-service"}'
|
| 41 |
-
|
| 42 |
-
# 3. Diagnose (reward: +0.30)
|
| 43 |
-
curl -X POST https://arijit-07-devops-incident-response.hf.space/step \
|
| 44 |
-
-H "Content-Type: application/json" \
|
| 45 |
-
-d '{"action_type": "diagnose", "root_cause": "memory leak in payment-service"}'
|
| 46 |
-
|
| 47 |
-
# 4. Fix it (reward: +0.40)
|
| 48 |
-
curl -X POST https://arijit-07-devops-incident-response.hf.space/step \
|
| 49 |
-
-H "Content-Type: application/json" \
|
| 50 |
-
-d '{"action_type": "restart_service", "service": "payment-service"}'
|
| 51 |
-
|
| 52 |
-
# 5. Validate all 7 tasks pass
|
| 53 |
-
curl https://arijit-07-devops-incident-response.hf.space/validate
|
| 54 |
-
```
|
| 55 |
-
|
| 56 |
-
---
|
| 57 |
-
|
| 58 |
-
## 🎯 The Problem
|
| 59 |
-
|
| 60 |
-
Every company running microservices faces the same reality: **production incidents are expensive, stressful, and happen at 3am.**
|
| 61 |
-
|
| 62 |
-
SWE-bench tests code generation. WebArena tests web navigation. Nothing trains agents to handle live production incidents — to read logs strategically, trace cascading failures, correlate subtle business anomalies, and apply precise fixes where wrong choices cause collateral damage.
|
| 63 |
-
|
| 64 |
-
**ARIA fills that gap.**
|
| 65 |
-
|
| 66 |
-
---
|
| 67 |
-
|
| 68 |
-
## 🎬 The 7 Tasks
|
| 69 |
-
|
| 70 |
-
| Task | Max Steps | Random | Strong LLM | Scenario |
|
| 71 |
-
|---|---|---|---|---|
|
| 72 |
-
| `easy` | 15 | 0.05 | 0.85–1.00 | Single service OOM crash-loop |
|
| 73 |
-
| `medium` | 20 | 0.03 | 0.55–0.75 | Cascading failure + red herring alert |
|
| 74 |
-
| `hard` | 25 | 0.01 | 0.30–0.50 | **Silent** corruption — all services green |
|
| 75 |
-
| `bonus` | 25 | 0.01 | 0.35–0.55 | Two simultaneous independent failures |
|
| 76 |
-
| `security` | 20 | 0.01 | 0.40–0.60 | DDoS botnet credential stuffing |
|
| 77 |
-
| `database` | 20 | 0.01 | 0.45–0.65 | Missing index — full table scans |
|
| 78 |
-
| `failover` | 25 | 0.01 | 0.35–0.55 | Multi-region network partition |
|
| 79 |
-
| `generated` | 20 | 0.01 | variable | Procedural — seed-deterministic |
|
| 80 |
-
|
| 81 |
-
---
|
| 82 |
-
|
| 83 |
-
## 🏆 Reward Function
|
| 84 |
-
|
| 85 |
-
```
|
| 86 |
-
Final Score = Σ(step_rewards)
|
| 87 |
-
+ efficiency_bonus # (1 - steps/max_steps) × 0.05
|
| 88 |
-
+ diagnosis_precision # +0.03 if ≥50% keyword overlap
|
| 89 |
-
- noop_penalty # (noops - 3) × 0.02
|
| 90 |
-
```
|
| 91 |
-
|
| 92 |
-
Clamped to **(0.001, 0.999)** for GRPO stability.
|
| 93 |
-
|
| 94 |
-
| Action | Reward | Penalty Triggers |
|
| 95 |
-
|---|---|---|
|
| 96 |
-
| `read_logs` correct | +0.15 | Restart healthy service: **-0.15** |
|
| 97 |
-
| `diagnose` full match | +0.35 | Fix without diagnosing: **-0.10** |
|
| 98 |
-
| `restart_service` correct | +0.45 | Wrong failover (payment): **-0.25** |
|
| 99 |
-
| `block_ip_range` | +0.40 | Excessive noops: **-0.04 each** |
|
| 100 |
-
| `alert_oncall` (required) | +0.15 | |
|
| 101 |
-
|
| 102 |
-
**Semantic matching:** keyword overlap not exact string — LLMs that paraphrase aren't penalized.
|
| 103 |
-
|
| 104 |
-
---
|
| 105 |
-
|
| 106 |
-
## 🌟 ARIA Features
|
| 107 |
-
|
| 108 |
-
### Curriculum Engine
|
| 109 |
-
Rolling average per task (last 5 episodes). Promotes when avg > 0.75. Scaffolds with hints when avg < 0.30. Agents always train at the edge of their capability.
|
| 110 |
-
|
| 111 |
-
```bash
|
| 112 |
-
GET /curriculum/status
|
| 113 |
-
GET /curriculum/next
|
| 114 |
-
POST /curriculum/record # {"task_id": "easy", "score": 0.85}
|
| 115 |
-
```
|
| 116 |
-
|
| 117 |
-
### Incident Generator
|
| 118 |
-
Seeds 0–99,999 → unique reproducible incidents. 6 failure modes × 8 services × 3 severities × 0–3 noise alerts.
|
| 119 |
-
|
| 120 |
-
```bash
|
| 121 |
-
GET /generate/preview?seed=1337
|
| 122 |
-
POST /reset # {"task_id": "generated", "seed": 1337}
|
| 123 |
-
```
|
| 124 |
-
|
| 125 |
-
### Dual-Agent Mode
|
| 126 |
-
Split observability. Agent A (Observer) sees logs and alerts. Agent B (Responder) sees metrics and dependencies. They coordinate via `share_finding`. Neither can solve the incident alone.
|
| 127 |
-
|
| 128 |
-
```bash
|
| 129 |
-
POST /multi-agent/reset # {"task_id": "easy", "seed": 42}
|
| 130 |
-
POST /multi-agent/step/a/{id} # {"finding": "order-service OOM"}
|
| 131 |
-
POST /multi-agent/step/b/{id} # {"action_type": "restart_service", ...}
|
| 132 |
-
```
|
| 133 |
-
|
| 134 |
-
---
|
| 135 |
-
|
| 136 |
-
## 🧠 Training Results
|
| 137 |
-
|
| 138 |
-
**Model:** [Arijit-07/aria-devops-llama8b](https://huggingface.co/Arijit-07/aria-devops-llama8b)
|
| 139 |
-
|
| 140 |
-
| Task | Baseline | Fine-tuned | **Improvement** |
|
| 141 |
-
|---|---|---|---|
|
| 142 |
-
| easy | 0.320 | 0.685 | **+0.365** |
|
| 143 |
-
| medium | 0.050 | 0.378 | **+0.328** |
|
| 144 |
-
| hard | 0.190 | 0.869 | **+0.679** |
|
| 145 |
-
| bonus | 0.152 | 0.682 | **+0.530** |
|
| 146 |
-
|
| 147 |
-

|
| 148 |
-
|
| 149 |
-
**Setup:** GRPO · Llama-3.1-8B · LoRA rank=32 · 160 episodes · NVIDIA L4 · 162 minutes · Unsloth + HuggingFace TRL
|
| 150 |
-
|
| 151 |
-
**Key fix:** Group completions scored on fresh environment snapshots — prevents reward gate exhaustion during GRPO group generation.
|
| 152 |
-
|
| 153 |
-
[](https://colab.research.google.com/github/Twilight-13/devops-incident-response/blob/main/train_grpo.ipynb)
|
| 154 |
-
|
| 155 |
-
---
|
| 156 |
-
|
| 157 |
-
## 📡 API Reference
|
| 158 |
-
|
| 159 |
-
| Method | Endpoint | Description |
|
| 160 |
-
|---|---|---|
|
| 161 |
-
| GET | `/health` | Liveness check |
|
| 162 |
-
| GET | `/about` | Full machine-readable description |
|
| 163 |
-
| GET | `/tasks` | All 8 tasks |
|
| 164 |
-
| POST | `/reset` | Start episode |
|
| 165 |
-
| POST | `/step` | Take action |
|
| 166 |
-
| GET | `/state` | Full state + ground truth |
|
| 167 |
-
| GET | `/validate` | Self-test all 7 tasks |
|
| 168 |
-
| GET | `/metrics` | Aggregate statistics |
|
| 169 |
-
| GET | `/leaderboard` | Top 10 episodes |
|
| 170 |
-
| WS | `/ws` | WebSocket real-time |
|
| 171 |
-
| GET | `/curriculum/status` | Per-task mastery |
|
| 172 |
-
| GET | `/curriculum/next` | Recommended task |
|
| 173 |
-
| POST | `/curriculum/record` | Feed training results |
|
| 174 |
-
| GET | `/generate/preview` | Preview procedural incident |
|
| 175 |
-
| POST | `/multi-agent/reset` | Start dual-agent session |
|
| 176 |
-
| POST | `/multi-agent/step/a/{id}` | Agent A shares finding |
|
| 177 |
-
| POST | `/multi-agent/step/b/{id}` | Agent B takes action |
|
| 178 |
-
| GET | `/docs` | Swagger UI |
|
| 179 |
-
|
| 180 |
-
---
|
| 181 |
-
|
| 182 |
-
## 📊 Benchmark Comparison
|
| 183 |
-
|
| 184 |
-
| Benchmark | Domain | Partial Obs | Dense Reward | Curriculum | Multi-Agent |
|
| 185 |
-
|---|---|---|---|---|---|
|
| 186 |
-
| SWE-bench | Code repair | ✗ | ✗ | ✗ | ✗ |
|
| 187 |
-
| WebArena | Web navigation | ✓ | ✗ | ✗ | ✗ |
|
| 188 |
-
| AgentBench | General tools | ✗ | ✗ | ✗ | ✗ |
|
| 189 |
-
| **ARIA** | **Incident response** | **✓** | **✓** | **✓** | **✓** |
|
| 190 |
-
|
| 191 |
-
---
|
| 192 |
-
|
| 193 |
-
## 🚀 Setup
|
| 194 |
-
|
| 195 |
-
```bash
|
| 196 |
-
docker build -t aria-devops-incident .
|
| 197 |
-
docker run -p 7860:7860 aria-devops-incident
|
| 198 |
-
|
| 199 |
-
# Or local
|
| 200 |
-
pip install -r requirements.txt
|
| 201 |
-
uvicorn api:app --host 0.0.0.0 --port 7860
|
| 202 |
-
```
|
| 203 |
-
|
| 204 |
-
---
|
| 205 |
-
|
| 206 |
-
## 📁 Structure
|
| 207 |
-
|
| 208 |
-
```
|
| 209 |
-
├── api.py / server/app.py # FastAPI — all endpoints
|
| 210 |
-
├── env.py # Environment dispatcher
|
| 211 |
-
├── models.py # Pydantic models
|
| 212 |
-
├── tasks/ # 7 tasks + generated
|
| 213 |
-
├── curriculum/engine.py # Adaptive difficulty
|
| 214 |
-
├── generator/ # Procedural incidents
|
| 215 |
-
├── multi_agent/session.py # Dual-agent mode
|
| 216 |
-
├── graders/grader.py # Deterministic grader
|
| 217 |
-
├── demo_llm.py # Live terminal demo
|
| 218 |
-
├── train_grpo.ipynb # Training notebook
|
| 219 |
-
├── BLOG.md # Project story
|
| 220 |
-
└── openenv.yaml # OpenEnv manifest
|
| 221 |
-
```
|
| 222 |
-
|
| 223 |
-
Apache 2.0 · *Built solo for the Meta × PyTorch × HuggingFace OpenEnv Hackathon Finals — Bangalore, April 2026*
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
escape.py
DELETED
|
@@ -1,35 +0,0 @@
|
|
| 1 |
-
import sys
|
| 2 |
-
import re
|
| 3 |
-
|
| 4 |
-
with open("ui_test.py", "r", encoding="utf-8") as f:
|
| 5 |
-
ui_html = f.read()
|
| 6 |
-
|
| 7 |
-
# Remove the first line (`html_content = """`) and the last line (`"""`)
|
| 8 |
-
ui_html = ui_html.replace('html_content = """', '')
|
| 9 |
-
ui_html = ui_html[:-3] # remove last """
|
| 10 |
-
|
| 11 |
-
# Escape curly braces
|
| 12 |
-
ui_html = ui_html.replace('{', '{{').replace('}', '}}')
|
| 13 |
-
|
| 14 |
-
with open("server/app.py", "r", encoding="utf-8") as f:
|
| 15 |
-
app_content = f.read()
|
| 16 |
-
|
| 17 |
-
# The function to replace is def dashboard() ... to </html>"""
|
| 18 |
-
# Let's find def dashboard():
|
| 19 |
-
start_idx = app_content.find("def dashboard():")
|
| 20 |
-
end_idx = app_content.find("</html>\"\"\"", start_idx) + len("</html>\"\"\"")
|
| 21 |
-
|
| 22 |
-
if start_idx == -1 or end_idx == -1:
|
| 23 |
-
print("Could not find dashboard function in server/app.py")
|
| 24 |
-
sys.exit(1)
|
| 25 |
-
|
| 26 |
-
new_dashboard = f'''def dashboard():
|
| 27 |
-
html = f"""{ui_html}"""
|
| 28 |
-
return html'''
|
| 29 |
-
|
| 30 |
-
new_content = app_content[:start_idx] + new_dashboard + app_content[end_idx:]
|
| 31 |
-
|
| 32 |
-
with open("server/app.py", "w", encoding="utf-8") as f:
|
| 33 |
-
f.write(new_content)
|
| 34 |
-
|
| 35 |
-
print("Successfully replaced dashboard endpoint!")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
inject_live.py
DELETED
|
@@ -1,618 +0,0 @@
|
|
| 1 |
-
import sys
|
| 2 |
-
|
| 3 |
-
html_content = r'''
|
| 4 |
-
@app.get("/live", response_class=HTMLResponse)
|
| 5 |
-
async def live_dashboard():
|
| 6 |
-
html = f"""<!DOCTYPE html>
|
| 7 |
-
<html lang="en">
|
| 8 |
-
<head>
|
| 9 |
-
<meta charset="UTF-8">
|
| 10 |
-
<title>ARIA NOC LIVE</title>
|
| 11 |
-
<link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;600;700&family=Share+Tech+Mono&display=swap" rel="stylesheet">
|
| 12 |
-
<style>
|
| 13 |
-
:root {{
|
| 14 |
-
--void: #000000;
|
| 15 |
-
--bg: #060914;
|
| 16 |
-
--surface: #0a0f1e;
|
| 17 |
-
--surface2: #0d1628;
|
| 18 |
-
--border: #1a2744;
|
| 19 |
-
--border-bright: #2a4080;
|
| 20 |
-
--blue: #4d9fff;
|
| 21 |
-
--blue-dim: #1a3a6e;
|
| 22 |
-
--cyan: #00d4ff;
|
| 23 |
-
--green: #00ff88;
|
| 24 |
-
--green-dim: #003a1e;
|
| 25 |
-
--yellow: #ffaa00;
|
| 26 |
-
--yellow-dim: #3a2800;
|
| 27 |
-
--red: #ff3355;
|
| 28 |
-
--red-dim: #3a0011;
|
| 29 |
-
--purple: #9d4edd;
|
| 30 |
-
--text: #c8d8f0;
|
| 31 |
-
--text-dim: #4a6080;
|
| 32 |
-
--text-mono: #8ab4d4;
|
| 33 |
-
}}
|
| 34 |
-
|
| 35 |
-
* {{ box-sizing: border-box; margin: 0; padding: 0; }}
|
| 36 |
-
|
| 37 |
-
body {{
|
| 38 |
-
background-color: var(--bg);
|
| 39 |
-
color: var(--text);
|
| 40 |
-
font-family: 'Inter', sans-serif;
|
| 41 |
-
overflow: hidden;
|
| 42 |
-
height: 100vh;
|
| 43 |
-
display: grid;
|
| 44 |
-
grid-template-rows: 48px 1fr 56px;
|
| 45 |
-
grid-template-columns: 28% 44% 28%;
|
| 46 |
-
grid-template-areas:
|
| 47 |
-
"top top top"
|
| 48 |
-
"left center right"
|
| 49 |
-
"bottom bottom bottom";
|
| 50 |
-
}}
|
| 51 |
-
|
| 52 |
-
.scanlines {{
|
| 53 |
-
position: fixed;
|
| 54 |
-
top: 0; left: 0; width: 100%; height: 100%;
|
| 55 |
-
pointer-events: none;
|
| 56 |
-
z-index: 9999;
|
| 57 |
-
background: repeating-linear-gradient(
|
| 58 |
-
0deg,
|
| 59 |
-
transparent,
|
| 60 |
-
transparent 2px,
|
| 61 |
-
rgba(0,0,0,0.03) 2px,
|
| 62 |
-
rgba(0,0,0,0.03) 4px
|
| 63 |
-
);
|
| 64 |
-
}}
|
| 65 |
-
|
| 66 |
-
.mono {{ font-family: 'Share Tech Mono', monospace; }}
|
| 67 |
-
.uppercase {{ text-transform: uppercase; }}
|
| 68 |
-
|
| 69 |
-
#top-bar {{
|
| 70 |
-
grid-area: top;
|
| 71 |
-
background: var(--void);
|
| 72 |
-
border-bottom: 1px solid var(--border);
|
| 73 |
-
display: flex;
|
| 74 |
-
justify-content: space-between;
|
| 75 |
-
align-items: center;
|
| 76 |
-
padding: 0 16px;
|
| 77 |
-
}}
|
| 78 |
-
|
| 79 |
-
.top-left, .top-center, .top-right {{ display: flex; align-items: center; gap: 12px; }}
|
| 80 |
-
|
| 81 |
-
.logo {{ font-size: 18px; color: var(--blue); font-weight: bold; }}
|
| 82 |
-
.logo-sub {{ font-size: 10px; color: var(--text-dim); }}
|
| 83 |
-
.separator {{ width: 1px; height: 24px; background: var(--border); }}
|
| 84 |
-
|
| 85 |
-
.status-dot {{ width: 8px; height: 8px; border-radius: 50%; }}
|
| 86 |
-
.dot-green {{ background: var(--red); animation: livePulse 1.5s infinite; }}
|
| 87 |
-
.dot-grey {{ background: var(--text-dim); }}
|
| 88 |
-
|
| 89 |
-
@keyframes livePulse {{
|
| 90 |
-
0% {{ opacity: 0; }}
|
| 91 |
-
50% {{ opacity: 1; }}
|
| 92 |
-
100% {{ opacity: 0; }}
|
| 93 |
-
}}
|
| 94 |
-
|
| 95 |
-
.control-label {{ font-size: 9px; color: var(--text-dim); }}
|
| 96 |
-
.terminal-input {{
|
| 97 |
-
background: var(--surface);
|
| 98 |
-
border: 1px solid var(--border-bright);
|
| 99 |
-
color: var(--blue);
|
| 100 |
-
font-family: 'Share Tech Mono', monospace;
|
| 101 |
-
padding: 4px 8px;
|
| 102 |
-
outline: none;
|
| 103 |
-
}}
|
| 104 |
-
.btn-deploy {{
|
| 105 |
-
background: var(--blue-dim);
|
| 106 |
-
border: 1px solid var(--blue);
|
| 107 |
-
color: var(--blue);
|
| 108 |
-
font-family: 'Share Tech Mono', monospace;
|
| 109 |
-
font-size: 11px;
|
| 110 |
-
padding: 6px 16px;
|
| 111 |
-
cursor: pointer;
|
| 112 |
-
transition: 0.2s;
|
| 113 |
-
}}
|
| 114 |
-
.btn-deploy:hover {{ background: var(--blue); color: var(--void); }}
|
| 115 |
-
|
| 116 |
-
.step-counter {{ font-size: 16px; color: var(--cyan); }}
|
| 117 |
-
.score-display-small {{ font-size: 20px; font-weight: bold; }}
|
| 118 |
-
.clock {{ font-size: 11px; color: var(--text-dim); }}
|
| 119 |
-
|
| 120 |
-
.panel {{
|
| 121 |
-
padding: 16px;
|
| 122 |
-
display: flex;
|
| 123 |
-
flex-direction: column;
|
| 124 |
-
gap: 12px;
|
| 125 |
-
overflow: hidden;
|
| 126 |
-
}}
|
| 127 |
-
#left-panel {{ grid-area: left; border-right: 1px solid var(--border); }}
|
| 128 |
-
#center-panel {{ grid-area: center; border-right: 1px solid var(--border); }}
|
| 129 |
-
#right-panel {{ grid-area: right; border-color: var(--purple); }}
|
| 130 |
-
|
| 131 |
-
.panel-header {{
|
| 132 |
-
display: flex; align-items: center; gap: 8px; font-size: 9px; color: var(--text-dim); margin-bottom: 8px;
|
| 133 |
-
}}
|
| 134 |
-
.pill {{ background: var(--surface2); padding: 2px 6px; border-radius: 10px; color: var(--text); }}
|
| 135 |
-
|
| 136 |
-
#service-list {{ display: flex; flex-direction: column; gap: 8px; overflow-y: auto; flex: 1; }}
|
| 137 |
-
.service-item {{
|
| 138 |
-
height: 52px; padding: 0 12px; display: flex; justify-content: space-between; align-items: center; flex-shrink: 0; transition: border-color 0.3s, background 0.3s;
|
| 139 |
-
}}
|
| 140 |
-
.svc-name {{ font-size: 12px; color: var(--text); }}
|
| 141 |
-
.svc-status {{ font-size: 9px; margin-top: 4px; }}
|
| 142 |
-
|
| 143 |
-
.svc-stats {{ text-align: right; }}
|
| 144 |
-
.svc-stat-line {{ font-size: 11px; }}
|
| 145 |
-
|
| 146 |
-
@keyframes statusFlash {{
|
| 147 |
-
0% {{ border-color: var(--text); }}
|
| 148 |
-
100% {{ border-color: inherit; }}
|
| 149 |
-
}}
|
| 150 |
-
@keyframes criticalFlash {{
|
| 151 |
-
0%, 50%, 100% {{ border-color: var(--border); }}
|
| 152 |
-
25%, 75% {{ border-color: var(--red); }}
|
| 153 |
-
}}
|
| 154 |
-
.flash-critical {{ animation: criticalFlash 0.5s ease-in-out; border-color: var(--red) !important; }}
|
| 155 |
-
|
| 156 |
-
@keyframes resolveFlash {{
|
| 157 |
-
0%, 50%, 100% {{ border-color: var(--border); }}
|
| 158 |
-
25%, 75% {{ border-color: var(--green); }}
|
| 159 |
-
}}
|
| 160 |
-
.flash-resolve {{ animation: resolveFlash 2s ease-in-out; border-color: var(--green) !important; }}
|
| 161 |
-
|
| 162 |
-
@keyframes pulseScore {{
|
| 163 |
-
0% {{ transform: scale(1); }}
|
| 164 |
-
50% {{ transform: scale(1.1); }}
|
| 165 |
-
100% {{ transform: scale(1); }}
|
| 166 |
-
}}
|
| 167 |
-
.pulse-score {{ animation: pulseScore 2s ease-in-out; }}
|
| 168 |
-
|
| 169 |
-
@keyframes slideInRight {{
|
| 170 |
-
from {{ transform: translateX(20px); opacity: 0; }}
|
| 171 |
-
to {{ transform: translateX(0); opacity: 1; }}
|
| 172 |
-
}}
|
| 173 |
-
@keyframes fadeIn {{
|
| 174 |
-
from {{ opacity: 0; }}
|
| 175 |
-
to {{ opacity: 1; }}
|
| 176 |
-
}}
|
| 177 |
-
|
| 178 |
-
.center-top {{ flex: 1; display: flex; flex-direction: column; overflow: hidden; }}
|
| 179 |
-
.center-bottom {{ height: 200px; display: flex; flex-direction: column; justify-content: flex-end; }}
|
| 180 |
-
|
| 181 |
-
#alerts-list {{ display: flex; flex-direction: column; gap: 8px; flex: 1; }}
|
| 182 |
-
.alert-strip {{
|
| 183 |
-
height: 36px; display: flex; align-items: center; gap: 8px; padding-right: 12px; animation: slideInRight 0.3s ease-out;
|
| 184 |
-
}}
|
| 185 |
-
.alert-badge {{
|
| 186 |
-
height: 100%; padding: 0 8px; display: flex; align-items: center; font-size: 9px; font-weight: bold; color: #000;
|
| 187 |
-
}}
|
| 188 |
-
.alert-text {{ font-size: 11px; color: var(--text); white-space: nowrap; overflow: hidden; text-overflow: ellipsis; }}
|
| 189 |
-
.no-alerts {{ text-align: center; color: var(--text-dim); margin-top: 40px; animation: livePulse 3s infinite; }}
|
| 190 |
-
|
| 191 |
-
.giant-score {{ font-size: 48px; font-weight: bold; text-align: center; margin-bottom: 12px; text-shadow: 0 0 20px currentColor; }}
|
| 192 |
-
.progress-container {{ width: 100%; height: 8px; background: var(--surface); margin-bottom: 8px; }}
|
| 193 |
-
.progress-fill {{ height: 100%; background: linear-gradient(90deg, var(--blue), var(--green)); transition: width 0.5s ease; width: 0%; }}
|
| 194 |
-
.score-stats {{ display: flex; justify-content: space-between; font-size: 10px; color: var(--text-dim); margin-bottom: 16px; }}
|
| 195 |
-
|
| 196 |
-
.sparkline {{ display: flex; align-items: flex-end; gap: 4px; height: 40px; margin-top: auto; }}
|
| 197 |
-
.spark-bar {{ width: 16px; background: var(--green); animation: slideInRight 0.2s ease-out; position: relative; }}
|
| 198 |
-
.spark-label {{ position: absolute; bottom: -14px; left: 50%; transform: translateX(-50%); font-size: 8px; color: var(--text-dim); }}
|
| 199 |
-
|
| 200 |
-
#agent-log {{
|
| 201 |
-
flex: 1; overflow-y: auto; display: flex; flex-direction: column; gap: 4px;
|
| 202 |
-
}}
|
| 203 |
-
.log-entry {{ animation: fadeIn 0.2s ease-out; font-size: 11px; line-height: 1.4; }}
|
| 204 |
-
.log-time {{ color: var(--text-dim); margin-right: 8px; }}
|
| 205 |
-
.log-action {{ color: var(--purple); }}
|
| 206 |
-
.log-reward {{ padding-left: 48px; }}
|
| 207 |
-
.log-evidence {{ color: var(--text-dim); font-style: italic; padding-left: 48px; }}
|
| 208 |
-
.log-diagnose {{ color: var(--yellow); }}
|
| 209 |
-
.log-fix {{ color: var(--cyan); }}
|
| 210 |
-
.log-episode-start {{ color: var(--cyan); text-align: center; margin: 8px 0; }}
|
| 211 |
-
.log-episode-end-ok {{ color: var(--green); text-align: center; margin: 8px 0; }}
|
| 212 |
-
.log-episode-end-fail {{ color: var(--red); text-align: center; margin: 8px 0; }}
|
| 213 |
-
|
| 214 |
-
#bottom-bar {{
|
| 215 |
-
grid-area: bottom; background: var(--void); border-top: 1px solid var(--border); display: flex; justify-content: space-between; align-items: center; padding: 0 16px;
|
| 216 |
-
}}
|
| 217 |
-
.ws-status {{ display: flex; align-items: center; gap: 8px; font-size: 11px; }}
|
| 218 |
-
.tip-text {{ font-size: 11px; color: var(--text-dim); font-style: italic; transition: opacity 0.5s; }}
|
| 219 |
-
.footer-right {{ font-size: 10px; color: var(--text-dim); }}
|
| 220 |
-
|
| 221 |
-
::-webkit-scrollbar {{ width: 4px; }}
|
| 222 |
-
::-webkit-scrollbar-track {{ background: transparent; }}
|
| 223 |
-
::-webkit-scrollbar-thumb {{ background: var(--border-bright); }}
|
| 224 |
-
</style>
|
| 225 |
-
</head>
|
| 226 |
-
<body>
|
| 227 |
-
<div class="scanlines"></div>
|
| 228 |
-
|
| 229 |
-
<div id="top-bar">
|
| 230 |
-
<div class="top-left">
|
| 231 |
-
<div class="logo mono">▣ ARIA</div>
|
| 232 |
-
<div class="logo-sub uppercase">Incident Response System</div>
|
| 233 |
-
<div class="separator"></div>
|
| 234 |
-
<div class="status-dot dot-grey" id="live-dot"></div>
|
| 235 |
-
<div class="logo-sub mono" id="live-text" style="color: var(--text)">OFFLINE</div>
|
| 236 |
-
</div>
|
| 237 |
-
|
| 238 |
-
<div class="top-center">
|
| 239 |
-
<div class="control-label uppercase">Active Scenario</div>
|
| 240 |
-
<select class="terminal-input" id="task-select">
|
| 241 |
-
<option value="easy">EASY</option>
|
| 242 |
-
<option value="medium">MEDIUM</option>
|
| 243 |
-
<option value="hard">HARD</option>
|
| 244 |
-
<option value="bonus">BONUS</option>
|
| 245 |
-
<option value="security">SECURITY</option>
|
| 246 |
-
<option value="database">DATABASE</option>
|
| 247 |
-
<option value="failover">FAILOVER</option>
|
| 248 |
-
<option value="generated">GENERATED</option>
|
| 249 |
-
</select>
|
| 250 |
-
<div class="control-label uppercase">Seed:</div>
|
| 251 |
-
<input type="number" class="terminal-input" id="seed-input" value="42" style="width: 70px;">
|
| 252 |
-
<button class="btn-deploy" onclick="deployIncident()">▶ DEPLOY INCIDENT</button>
|
| 253 |
-
</div>
|
| 254 |
-
|
| 255 |
-
<div class="top-right">
|
| 256 |
-
<div class="step-counter mono" id="top-step">00 / 15</div>
|
| 257 |
-
<div class="separator"></div>
|
| 258 |
-
<div class="score-display-small mono" id="top-score">0.000</div>
|
| 259 |
-
<div class="separator"></div>
|
| 260 |
-
<div class="clock mono" id="clock">00:00:00</div>
|
| 261 |
-
</div>
|
| 262 |
-
</div>
|
| 263 |
-
|
| 264 |
-
<div id="left-panel" class="panel">
|
| 265 |
-
<div class="panel-header uppercase">
|
| 266 |
-
◈ Infrastructure Status <span class="pill mono" id="svc-count">0</span>
|
| 267 |
-
</div>
|
| 268 |
-
<div id="service-list"></div>
|
| 269 |
-
</div>
|
| 270 |
-
|
| 271 |
-
<div id="center-panel" class="panel">
|
| 272 |
-
<div class="center-top">
|
| 273 |
-
<div class="panel-header uppercase">
|
| 274 |
-
◈ Active Alerts <span class="pill mono" id="alert-count" style="background:var(--surface2)">0</span>
|
| 275 |
-
</div>
|
| 276 |
-
<div id="alerts-list">
|
| 277 |
-
<div class="no-alerts mono">◎ ALL SYSTEMS NOMINAL</div>
|
| 278 |
-
</div>
|
| 279 |
-
</div>
|
| 280 |
-
|
| 281 |
-
<div class="center-bottom">
|
| 282 |
-
<div class="panel-header uppercase">◈ Episode Metrics</div>
|
| 283 |
-
<div class="giant-score mono" id="giant-score" style="color: var(--text-dim)">0.000</div>
|
| 284 |
-
<div class="progress-container"><div class="progress-fill" id="score-bar"></div></div>
|
| 285 |
-
<div class="score-stats mono uppercase">
|
| 286 |
-
<span id="stat-step">STEP 0/15</span>
|
| 287 |
-
<span id="stat-task">TASK: --</span>
|
| 288 |
-
<span id="stat-seed">SEED: --</span>
|
| 289 |
-
</div>
|
| 290 |
-
<div class="sparkline" id="sparkline"></div>
|
| 291 |
-
</div>
|
| 292 |
-
</div>
|
| 293 |
-
|
| 294 |
-
<div id="right-panel" class="panel">
|
| 295 |
-
<div class="panel-header uppercase" style="color: var(--purple)">◈ Agent Reasoning</div>
|
| 296 |
-
<div id="agent-log" class="mono"></div>
|
| 297 |
-
</div>
|
| 298 |
-
|
| 299 |
-
<div id="bottom-bar">
|
| 300 |
-
<div class="ws-status mono">
|
| 301 |
-
<div class="status-dot dot-grey" id="btm-dot"></div>
|
| 302 |
-
<span id="btm-text" style="color: var(--text-dim)">○ WS DISCONNECTED</span>
|
| 303 |
-
</div>
|
| 304 |
-
<div class="tip-text" id="tip-text">ⓘ Agents must read_logs before acting — blind remediation triggers -0.10 penalty</div>
|
| 305 |
-
<div class="footer-right mono">ARIA v2.0 · OpenEnv Compliant 🤗 Arijit-07</div>
|
| 306 |
-
</div>
|
| 307 |
-
|
| 308 |
-
<script>
|
| 309 |
-
const TIPS = [
|
| 310 |
-
"ⓘ Agents must read_logs before acting — blind remediation triggers -0.10 penalty",
|
| 311 |
-
"ⓘ Collateral damage: restarting healthy services costs -0.15",
|
| 312 |
-
"ⓘ 7 tasks · 14 actions · Dense reward shaping · Semantic diagnosis matching",
|
| 313 |
-
"ⓘ Curriculum Engine adapts difficulty to agent performance",
|
| 314 |
-
"ⓘ Dual-Agent Mode: Observer sees logs, Responder sees metrics",
|
| 315 |
-
"ⓘ Grader clamped to (0.001, 0.999) for GRPO advantage stability",
|
| 316 |
-
"ⓘ Hard task: all services green — signal buried in business metrics"
|
| 317 |
-
];
|
| 318 |
-
let tipIdx = 0;
|
| 319 |
-
setInterval(() => {{
|
| 320 |
-
const el = document.getElementById('tip-text');
|
| 321 |
-
el.style.opacity = 0;
|
| 322 |
-
setTimeout(() => {{
|
| 323 |
-
tipIdx = (tipIdx + 1) % TIPS.length;
|
| 324 |
-
el.textContent = TIPS[tipIdx];
|
| 325 |
-
el.style.opacity = 1;
|
| 326 |
-
}}, 500);
|
| 327 |
-
}}, 15000);
|
| 328 |
-
|
| 329 |
-
setInterval(() => {{
|
| 330 |
-
const now = new Date();
|
| 331 |
-
document.getElementById('clock').textContent = now.toTimeString().split(' ')[0];
|
| 332 |
-
}}, 1000);
|
| 333 |
-
|
| 334 |
-
let ws = null;
|
| 335 |
-
let currentTask = 'easy';
|
| 336 |
-
let currentSeed = 42;
|
| 337 |
-
let stepCount = 0;
|
| 338 |
-
let totalScore = 0;
|
| 339 |
-
let isRunning = false;
|
| 340 |
-
let rewardHistory = [];
|
| 341 |
-
|
| 342 |
-
function getScoreColor(sc) {{
|
| 343 |
-
if(sc < 0.3) return 'var(--red)';
|
| 344 |
-
if(sc < 0.6) return 'var(--yellow)';
|
| 345 |
-
return 'var(--green)';
|
| 346 |
-
}}
|
| 347 |
-
|
| 348 |
-
function updateScoreDisplay() {{
|
| 349 |
-
const sc = Math.max(0, totalScore);
|
| 350 |
-
const col = getScoreColor(sc);
|
| 351 |
-
|
| 352 |
-
const ts = document.getElementById('top-score');
|
| 353 |
-
ts.textContent = sc.toFixed(3);
|
| 354 |
-
ts.style.color = col;
|
| 355 |
-
|
| 356 |
-
const gs = document.getElementById('giant-score');
|
| 357 |
-
gs.textContent = sc.toFixed(3);
|
| 358 |
-
gs.style.color = col;
|
| 359 |
-
|
| 360 |
-
document.getElementById('score-bar').style.width = Math.min(100, sc * 100) + '%';
|
| 361 |
-
}}
|
| 362 |
-
|
| 363 |
-
function updateStepCounter() {{
|
| 364 |
-
document.getElementById('top-step').textContent = `${{stepCount.toString().padStart(2,'0')}} / 15`;
|
| 365 |
-
document.getElementById('stat-step').textContent = `STEP ${{stepCount}}/15`;
|
| 366 |
-
}}
|
| 367 |
-
|
| 368 |
-
function addLog(type, arg1, arg2) {{
|
| 369 |
-
const logEl = document.getElementById('agent-log');
|
| 370 |
-
const div = document.createElement('div');
|
| 371 |
-
div.className = 'log-entry';
|
| 372 |
-
|
| 373 |
-
const timeStr = new Date().toTimeString().split(' ')[0];
|
| 374 |
-
const timeSpan = `<span class="log-time">[${{timeStr}}]</span>`;
|
| 375 |
-
|
| 376 |
-
if (type === 'SYSTEM') {{
|
| 377 |
-
div.innerHTML = `${{timeSpan}} <span style="color:var(--text-dim)">${{arg1}}</span>`;
|
| 378 |
-
}} else if (type === 'EPISODE_START') {{
|
| 379 |
-
div.innerHTML = `<div class="log-episode-start">━━━ NEW INCIDENT DEPLOYED ━━━<br>Task: ${{arg1.toUpperCase()}} | Seed: ${{arg2}}</div>`;
|
| 380 |
-
}} else if (type === 'ACTION') {{
|
| 381 |
-
div.innerHTML = `${{timeSpan}} <span class="log-action">→ ${{arg1.action_type}} ${{arg1.service || ''}}</span>`;
|
| 382 |
-
}} else if (type === 'REWARD') {{
|
| 383 |
-
let col = arg1 > 0 ? 'var(--green)' : (arg1 === 0 ? 'var(--red)' : 'var(--text-dim)');
|
| 384 |
-
div.innerHTML = `<div class="log-reward" style="color:${{col}}">✦ ${{arg1 > 0 ? '+' : ''}}${{arg1.toFixed(3)}} reward</div>`;
|
| 385 |
-
}} else if (type === 'EVIDENCE') {{
|
| 386 |
-
let txt = (arg1 || '').substring(0, 60);
|
| 387 |
-
if(arg1 && arg1.length > 60) txt += '...';
|
| 388 |
-
div.innerHTML = `<div class="log-evidence">↳ ${{txt}}</div>`;
|
| 389 |
-
}} else if (type === 'DIAGNOSE') {{
|
| 390 |
-
div.innerHTML = `${{timeSpan}} <span class="log-diagnose">⊕ DIAGNOSIS: ${{arg1}}</span>`;
|
| 391 |
-
}} else if (type === 'FIX') {{
|
| 392 |
-
div.innerHTML = `${{timeSpan}} <span class="log-fix">⚡ FIX APPLIED: ${{arg1}} → ${{arg2}}</span>`;
|
| 393 |
-
}} else if (type === 'EPISODE_END') {{
|
| 394 |
-
if (arg1 >= 0.7) {{
|
| 395 |
-
div.innerHTML = `<div class="log-episode-end-ok">━━━ ✓ INCIDENT RESOLVED ━━━<br>Score: ${{arg1.toFixed(3)}} | Steps: ${{arg2}}/15<br>━━━━━━━━━━━━━━━━━━━━━━━━━━━</div>`;
|
| 396 |
-
document.getElementById('center-panel').classList.add('flash-resolve');
|
| 397 |
-
document.getElementById('giant-score').classList.add('pulse-score');
|
| 398 |
-
setTimeout(()=>{{
|
| 399 |
-
document.getElementById('center-panel').classList.remove('flash-resolve');
|
| 400 |
-
document.getElementById('giant-score').classList.remove('pulse-score');
|
| 401 |
-
}}, 2000);
|
| 402 |
-
}} else {{
|
| 403 |
-
div.innerHTML = `<div class="log-episode-end-fail">━━━ ✗ INCIDENT ESCALATED ━━━<br>Score: ${{arg1.toFixed(3)}} | Steps: ${{arg2}}/15<br>━━━━━━━━━━━━━━━━━━━━━━━━━━━</div>`;
|
| 404 |
-
}}
|
| 405 |
-
}}
|
| 406 |
-
|
| 407 |
-
logEl.appendChild(div);
|
| 408 |
-
if(logEl.children.length > 200) logEl.removeChild(logEl.firstChild);
|
| 409 |
-
logEl.scrollTop = logEl.scrollHeight;
|
| 410 |
-
}}
|
| 411 |
-
|
| 412 |
-
function updateSparkline() {{
|
| 413 |
-
const sp = document.getElementById('sparkline');
|
| 414 |
-
sp.innerHTML = '';
|
| 415 |
-
const start = Math.max(0, rewardHistory.length - 12);
|
| 416 |
-
const recent = rewardHistory.slice(start);
|
| 417 |
-
|
| 418 |
-
recent.forEach((r, i) => {{
|
| 419 |
-
const h = Math.max(2, Math.min(40, (r / 0.5) * 40));
|
| 420 |
-
const col = r > 0 ? 'var(--green)' : 'var(--red)';
|
| 421 |
-
sp.innerHTML += `<div class="spark-bar" style="height:${{h}}px; background:${{col}}"><div class="spark-label">${{start + i + 1}}</div></div>`;
|
| 422 |
-
}});
|
| 423 |
-
}}
|
| 424 |
-
|
| 425 |
-
function startEpisode(task, seed) {{
|
| 426 |
-
stepCount = 0;
|
| 427 |
-
totalScore = 0;
|
| 428 |
-
rewardHistory = [];
|
| 429 |
-
isRunning = true;
|
| 430 |
-
currentTask = task;
|
| 431 |
-
currentSeed = seed;
|
| 432 |
-
|
| 433 |
-
document.getElementById('stat-task').textContent = `TASK: ${{task.toUpperCase()}}`;
|
| 434 |
-
document.getElementById('stat-seed').textContent = `SEED: ${{seed}}`;
|
| 435 |
-
updateStepCounter();
|
| 436 |
-
updateScoreDisplay();
|
| 437 |
-
updateSparkline();
|
| 438 |
-
document.getElementById('alerts-list').innerHTML = '<div class="no-alerts mono">◎ ALL SYSTEMS NOMINAL</div>';
|
| 439 |
-
document.getElementById('alert-count').textContent = '0';
|
| 440 |
-
document.getElementById('alert-count').style.background = 'var(--surface2)';
|
| 441 |
-
|
| 442 |
-
addLog('EPISODE_START', task, seed);
|
| 443 |
-
if(ws && ws.readyState === WebSocket.OPEN) {{
|
| 444 |
-
ws.send(JSON.stringify({{command: "reset", task_id: task, seed: seed}}));
|
| 445 |
-
}}
|
| 446 |
-
}}
|
| 447 |
-
|
| 448 |
-
function deployIncident() {{
|
| 449 |
-
const task = document.getElementById('task-select').value;
|
| 450 |
-
const seed = parseInt(document.getElementById('seed-input').value) || 42;
|
| 451 |
-
if(ws && ws.readyState === WebSocket.OPEN) {{
|
| 452 |
-
startEpisode(task, seed);
|
| 453 |
-
}} else {{
|
| 454 |
-
connectWS();
|
| 455 |
-
}}
|
| 456 |
-
}}
|
| 457 |
-
|
| 458 |
-
function connectWS() {{
|
| 459 |
-
if(ws) ws.close();
|
| 460 |
-
const protocol = window.location.protocol === 'https:' ? 'wss:' : 'ws:';
|
| 461 |
-
const wsUrl = `${{protocol}}//${{window.location.host}}/ws`;
|
| 462 |
-
|
| 463 |
-
ws = new WebSocket(wsUrl);
|
| 464 |
-
|
| 465 |
-
ws.onopen = () => {{
|
| 466 |
-
document.getElementById('live-dot').className = 'status-dot dot-green';
|
| 467 |
-
document.getElementById('live-text').textContent = 'LIVE';
|
| 468 |
-
document.getElementById('live-text').style.color = 'var(--red)';
|
| 469 |
-
document.getElementById('btm-dot').className = 'status-dot dot-green';
|
| 470 |
-
document.getElementById('btm-text').textContent = '◉ WS CONNECTED';
|
| 471 |
-
document.getElementById('btm-text').style.color = 'var(--green)';
|
| 472 |
-
addLog('SYSTEM', 'WebSocket connected');
|
| 473 |
-
startEpisode(currentTask, currentSeed);
|
| 474 |
-
}};
|
| 475 |
-
|
| 476 |
-
ws.onclose = () => {{
|
| 477 |
-
document.getElementById('live-dot').className = 'status-dot dot-grey';
|
| 478 |
-
document.getElementById('live-text').textContent = 'OFFLINE';
|
| 479 |
-
document.getElementById('live-text').style.color = 'var(--text)';
|
| 480 |
-
document.getElementById('btm-dot').className = 'status-dot dot-grey';
|
| 481 |
-
document.getElementById('btm-text').textContent = '○ WS DISCONNECTED';
|
| 482 |
-
document.getElementById('btm-text').style.color = 'var(--text-dim)';
|
| 483 |
-
addLog('SYSTEM', 'Disconnected — reconnecting in 3s...');
|
| 484 |
-
setTimeout(connectWS, 3000);
|
| 485 |
-
}};
|
| 486 |
-
|
| 487 |
-
ws.onmessage = (event) => {{
|
| 488 |
-
let data;
|
| 489 |
-
try {{ data = JSON.parse(event.data); }} catch(e) {{ return; }}
|
| 490 |
-
|
| 491 |
-
if(data.services) {{
|
| 492 |
-
const svcs = Object.entries(data.services).map(([name, s]) => ({{name, ...s}}));
|
| 493 |
-
svcs.sort((a, b) => {{
|
| 494 |
-
const val = (st) => st === 'down' ? 0 : (st === 'degraded' ? 1 : 2);
|
| 495 |
-
return val(a.status) - val(b.status);
|
| 496 |
-
}});
|
| 497 |
-
|
| 498 |
-
const list = document.getElementById('service-list');
|
| 499 |
-
list.innerHTML = '';
|
| 500 |
-
document.getElementById('svc-count').textContent = svcs.length;
|
| 501 |
-
|
| 502 |
-
svcs.forEach(s => {{
|
| 503 |
-
let bcol = 'var(--border)', bgcol = 'var(--surface)', tcol = 'var(--text-dim)', stxt = '○ UNKNOWN';
|
| 504 |
-
if(s.status === 'down') {{ bcol = 'var(--red)'; bgcol = 'var(--red-dim)'; tcol = 'var(--red)'; stxt = '● DOWN'; }}
|
| 505 |
-
else if(s.status === 'degraded') {{ bcol = 'var(--yellow)'; bgcol = 'var(--yellow-dim)'; tcol = 'var(--yellow)'; stxt = '◐ DEGRADED'; }}
|
| 506 |
-
else if(s.status === 'healthy') {{ bcol = 'var(--green)'; bgcol = 'var(--green-dim)'; tcol = 'var(--green)'; stxt = '○ HEALTHY'; }}
|
| 507 |
-
|
| 508 |
-
let errRate = (s.error_rate * 100).toFixed(1);
|
| 509 |
-
let memUtil = (s.memory_utilization * 100).toFixed(1);
|
| 510 |
-
let errCol = s.error_rate > 0.3 ? 'var(--red)' : (s.error_rate > 0.1 ? 'var(--yellow)' : 'var(--green)');
|
| 511 |
-
let memCol = s.memory_utilization > 0.9 ? 'var(--red)' : (s.memory_utilization > 0.7 ? 'var(--yellow)' : 'var(--green)');
|
| 512 |
-
|
| 513 |
-
list.innerHTML += `
|
| 514 |
-
<div class="service-item mono" style="border-left: 3px solid ${{bcol}}; background: ${{bgcol}}">
|
| 515 |
-
<div>
|
| 516 |
-
<div class="svc-name">${{s.name}}</div>
|
| 517 |
-
<div class="svc-status" style="color:${{tcol}}">${{stxt}}</div>
|
| 518 |
-
</div>
|
| 519 |
-
<div class="svc-stats">
|
| 520 |
-
<div class="svc-stat-line" style="color:${{errCol}}">ERR ${{errRate}}%</div>
|
| 521 |
-
<div class="svc-stat-line" style="color:${{memCol}}">MEM ${{memUtil}}%</div>
|
| 522 |
-
</div>
|
| 523 |
-
</div>
|
| 524 |
-
`;
|
| 525 |
-
}});
|
| 526 |
-
}}
|
| 527 |
-
|
| 528 |
-
if(data.active_alerts) {{
|
| 529 |
-
const alist = document.getElementById('alerts-list');
|
| 530 |
-
alist.innerHTML = '';
|
| 531 |
-
document.getElementById('alert-count').textContent = data.active_alerts.length;
|
| 532 |
-
document.getElementById('alert-count').style.background = data.active_alerts.length > 0 ? 'var(--red)' : 'var(--surface2)';
|
| 533 |
-
|
| 534 |
-
if(data.active_alerts.length === 0) {{
|
| 535 |
-
alist.innerHTML = '<div class="no-alerts mono">◎ ALL SYSTEMS NOMINAL</div>';
|
| 536 |
-
}} else {{
|
| 537 |
-
let critFound = false;
|
| 538 |
-
data.active_alerts.slice(0, 5).forEach(a => {{
|
| 539 |
-
let bg = 'var(--surface)', border = 'var(--border)', txtCol = '#000';
|
| 540 |
-
if(a.severity === 'CRITICAL') {{ border = 'var(--red)'; bg = 'var(--red)'; critFound = true; }}
|
| 541 |
-
else if(a.severity === 'HIGH') {{ border = '#ff6600'; bg = '#ff6600'; }}
|
| 542 |
-
else if(a.severity === 'WARNING') {{ border = 'var(--yellow)'; bg = 'var(--yellow)'; }}
|
| 543 |
-
else {{ border = 'var(--blue)'; bg = 'var(--blue)'; }}
|
| 544 |
-
|
| 545 |
-
alist.innerHTML += `
|
| 546 |
-
<div class="alert-strip mono" style="border-left: 3px solid ${{border}}; background: ${{bg}}20">
|
| 547 |
-
<div class="alert-badge" style="background:${{bg}}; color:${{txtCol}}">${{a.severity}}</div>
|
| 548 |
-
<div class="alert-text">[${{a.service}}] ${{a.message}}</div>
|
| 549 |
-
</div>
|
| 550 |
-
`;
|
| 551 |
-
}});
|
| 552 |
-
if(data.active_alerts.length > 5) {{
|
| 553 |
-
alist.innerHTML += `<div class="mono" style="font-size:9px; color:var(--text-dim); text-align:center">+${{data.active_alerts.length - 5}} more</div>`;
|
| 554 |
-
}}
|
| 555 |
-
if(critFound) {{
|
| 556 |
-
const lp = document.getElementById('left-panel');
|
| 557 |
-
lp.classList.remove('flash-critical');
|
| 558 |
-
void lp.offsetWidth;
|
| 559 |
-
lp.classList.add('flash-critical');
|
| 560 |
-
}}
|
| 561 |
-
}}
|
| 562 |
-
}}
|
| 563 |
-
|
| 564 |
-
if(data.action !== undefined && isRunning) {{
|
| 565 |
-
stepCount++;
|
| 566 |
-
updateStepCounter();
|
| 567 |
-
|
| 568 |
-
let act = data.action;
|
| 569 |
-
if(typeof act === 'string') try{{ act = JSON.parse(act) }}catch(e){{}}
|
| 570 |
-
|
| 571 |
-
if(act.action_type === 'diagnose') addLog('DIAGNOSE', act.root_cause);
|
| 572 |
-
else if(act.action_type === 'restart_service' || act.action_type === 'rollback_service' || act.action_type === 'block_ip')
|
| 573 |
-
addLog('FIX', act.action_type, act.service || act.ip);
|
| 574 |
-
else addLog('ACTION', act);
|
| 575 |
-
|
| 576 |
-
if(data.evidence) addLog('EVIDENCE', data.evidence);
|
| 577 |
-
if(data.reward !== undefined) {{
|
| 578 |
-
totalScore += data.reward;
|
| 579 |
-
rewardHistory.push(data.reward);
|
| 580 |
-
addLog('REWARD', data.reward);
|
| 581 |
-
updateScoreDisplay();
|
| 582 |
-
updateSparkline();
|
| 583 |
-
}}
|
| 584 |
-
|
| 585 |
-
if(data.done) {{
|
| 586 |
-
isRunning = false;
|
| 587 |
-
addLog('EPISODE_END', totalScore, stepCount);
|
| 588 |
-
updateScoreDisplay();
|
| 589 |
-
setTimeout(() => {{
|
| 590 |
-
currentSeed = Math.floor(Math.random() * 99999);
|
| 591 |
-
document.getElementById('seed-input').value = currentSeed;
|
| 592 |
-
startEpisode(currentTask, currentSeed);
|
| 593 |
-
}}, 4000);
|
| 594 |
-
}}
|
| 595 |
-
}}
|
| 596 |
-
}};
|
| 597 |
-
}}
|
| 598 |
-
|
| 599 |
-
window.onload = connectWS;
|
| 600 |
-
</script>
|
| 601 |
-
</body>
|
| 602 |
-
</html>
|
| 603 |
-
"""
|
| 604 |
-
return HTMLResponse(html)
|
| 605 |
-
'''
|
| 606 |
-
|
| 607 |
-
with open("server/app.py", "r", encoding="utf-8") as f:
|
| 608 |
-
content = f.read()
|
| 609 |
-
|
| 610 |
-
# find @app.get("/", response_class=HTMLResponse)
|
| 611 |
-
target = '@app.get("/", response_class=HTMLResponse)'
|
| 612 |
-
if target in content:
|
| 613 |
-
new_content = content.replace(target, html_content + "\n\n" + target)
|
| 614 |
-
with open("server/app.py", "w", encoding="utf-8") as f:
|
| 615 |
-
f.write(new_content)
|
| 616 |
-
print("SUCCESS")
|
| 617 |
-
else:
|
| 618 |
-
print("NOT FOUND")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
technical_reference.md
DELETED
|
@@ -1,106 +0,0 @@
|
|
| 1 |
-
# ARIA: DevOps Incident Response – Technical Reference Manual
|
| 2 |
-
|
| 3 |
-
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
| 4 |
-
SECTION 1: PROJECT OVERVIEW
|
| 5 |
-
━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
| 6 |
-
**Project**: ARIA (DevOps Incident Response)
|
| 7 |
-
**Purpose**: An OpenEnv-compliant RL environment where AI agents diagnose and remediate production software incidents across a simulated microservices architecture. Designed for the Meta × PyTorch × HuggingFace OpenEnv Hackathon finals.
|
| 8 |
-
|
| 9 |
-
**Architecture Stack**:
|
| 10 |
-
- **Framework**: FastAPI (Python)
|
| 11 |
-
- **State Management**: In-memory `DevOpsEnvironment`. Websocket support available for real-time streaming.
|
| 12 |
-
- **Core Data Models**: Pydantic (`Action`, `Observation`, `State`, `StepResult`)
|
| 13 |
-
- **Protocol**: JSON over REST (`/reset`, `/step`, `/state`, `/validate`, `/multi-agent/*`, `/curriculum/*`)
|
| 14 |
-
- **Deployment**: Hugging Face Spaces (`server/app.py` is the verified production entrypoint).
|
| 15 |
-
|
| 16 |
-
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
| 17 |
-
SECTION 2: CORE ENVIRONMENT & ACTION SPACE
|
| 18 |
-
━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
| 19 |
-
The environment supports standard API and agent interactions. SLA degradation kicks in every step (if a service is `down`, error rates creep up; if `degraded`, latency increases).
|
| 20 |
-
|
| 21 |
-
**Action Types & Side Effects**:
|
| 22 |
-
- `read_logs(service)`: Returns the last 2 lines + a summary of hidden lines.
|
| 23 |
-
- `search_logs(service, query)`: Case-insensitive search on logs.
|
| 24 |
-
- `read_metrics(service)`: Returns CPU, memory, error rate, p99 latency, replicas, version, and SLA breach info.
|
| 25 |
-
- `read_runbook(runbook)`: Loads Markdown text from the `data/runbooks/` directory.
|
| 26 |
-
- `acknowledge(service)`: Acknowledges an active alert ID.
|
| 27 |
-
- `diagnose(root_cause)`: Evaluates keyword overlap for a diagnosis bonus reward.
|
| 28 |
-
- `restart_service(service)`: Fixes OOMs. Penalised if used on stateful services, unaffected services, or repeated.
|
| 29 |
-
- `scale_up(service)`: Increases replicas. Penalised if data corruption or unrelated.
|
| 30 |
-
- `rollback(service, version)`: Reverts a bad deployment to fix cascading failures.
|
| 31 |
-
- `alert_oncall(reason)`: Required for cross-team fixes (data audit, security escalation, DBA intervention).
|
| 32 |
-
- `block_ip_range(ip_range)`: Security response mechanism for DDoS attacks.
|
| 33 |
-
- `create_index(table, column)`: DBA response for slow database queries.
|
| 34 |
-
- `failover(target_region)`: Fails over eligible stateless services to `us-west-2`.
|
| 35 |
-
- `noop()`: Take no action. Penalised if used excessively (> 3 times).
|
| 36 |
-
|
| 37 |
-
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
| 38 |
-
SECTION 3: THE 7 CORE TASKS (PLUS GENERATED)
|
| 39 |
-
━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
| 40 |
-
|
| 41 |
-
| Task ID | Max Steps | Description & Difficulty | Ground Truth Root Cause | Ground Truth Fix |
|
| 42 |
-
|---|---|---|---|---|
|
| 43 |
-
| `easy` | 15 | **Single Service OOM**: One service crash-loops from a memory leak. | `memory_leak_{service}` | `restart {service}` |
|
| 44 |
-
| `medium` | 20 | **Cascading Failure**: Bad deployment exhausts connection pools, cascading. Includes a red-herring alert. | `connection_pool_exhaustion` or `null_pointer` | `rollback {service}` |
|
| 45 |
-
| `hard` | 25 | **Silent Data Corruption**: All services green. No error alerts. Requires correlating subtle business metrics. | `data_corruption_data_pipeline...` | `rollback data-pipeline` AND `alert_oncall` |
|
| 46 |
-
| `bonus` | 25 | **Dual Simultaneous Failure**: Two independent failures at once. Both must be fixed. | `disk_full_log... AND model_reload_loop...` | `alert_oncall` (disk) AND `rollback` (ml) |
|
| 47 |
-
| `security` | 20 | **DDoS Attack**: Botnet credential stuffing. Requires blocking CIDR and escalation. | `ddos_attack_185.x.x.x...` | `block_ip_range` AND `alert_oncall` |
|
| 48 |
-
| `database` | 20 | **DB Degradation**: Missing schema index causing full table scans. | `missing_index_orders_user_segment...` | `create_index` or `rollback` |
|
| 49 |
-
| `failover` | 25 | **Multi-Region Failover**: Network partition. Fails over stateless services. | `us_east_1_network_partition...` | `failover` eligible AND `alert_oncall` others |
|
| 50 |
-
| `generated` | 20 | **Procedural Incident**: A seed-based deterministic incident generated by ARIA. | (Deterministic via Seed) | (Varies by failure mode) |
|
| 51 |
-
|
| 52 |
-
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
| 53 |
-
SECTION 4: REWARD SHAPING & GRADING LOGIC
|
| 54 |
-
━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
| 55 |
-
The evaluation logic (`graders/grader.py`) calculates the final episode score, strictly bounded to `[0.001, 0.999]` to pass OpenEnv validation checks.
|
| 56 |
-
|
| 57 |
-
- **Base Score**: The accumulated `total_reward` from individual step functions (clamped to `[0.0, 1.0]`).
|
| 58 |
-
- **Efficiency Bonus**: If the incident is resolved, `+ (1.0 - (steps_taken / max_steps)) * 0.05`.
|
| 59 |
-
- **Diagnosis Precision Bonus**: Checks keyword overlap of the `diagnose` action against the ground truth. `>= 50%` overlap adds `+0.03`. `>= 30%` overlap adds `+0.01`.
|
| 60 |
-
- **Noop Penalty**: `(noop_count - 3) * 0.02` for excessive `noop` actions.
|
| 61 |
-
- **Restart Penalty**: `(restarts - 1) * 0.05` per service restarted more than once (discourages guess-and-check).
|
| 62 |
-
- **Blind Remediation Penalty** (`tasks/base.py`): `-0.05` applied locally in step functions if a fix action is taken before any `diagnose_correct` or `diagnose_partial` is awarded.
|
| 63 |
-
|
| 64 |
-
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
| 65 |
-
SECTION 5: ADVANCED SUB-SYSTEMS
|
| 66 |
-
━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
| 67 |
-
### Curriculum Engine (`curriculum/engine.py`)
|
| 68 |
-
- **Mastery Tracker**: Keeps a rolling average of the last 5 scores per task.
|
| 69 |
-
- **Mastery Levels**: Novice (0) ➔ Intermediate (1) ➔ Advanced (2) ➔ Mastered (3).
|
| 70 |
-
- **Thresholds**: Rolling Avg `> 0.75` promotes mastery. `< 0.30` demotes mastery.
|
| 71 |
-
- **Scaffolding**: Provides specific hints if a task is failed `>= 3` times and avg is `< 0.30`.
|
| 72 |
-
- **Next Task Logic**: Returns the task with the lowest rolling average among non-mastered candidates.
|
| 73 |
-
|
| 74 |
-
### Multi-Agent Dual Session (`multi_agent/session.py`)
|
| 75 |
-
- **Agent A (Observer)**: Read-only. Prompted to review logs and alerts. Must use `share_finding` to pass observations.
|
| 76 |
-
- **Agent B (Responder)**: Write-only. Relies on Agent A's findings plus service metrics to execute remediation actions.
|
| 77 |
-
- **Endpoints**: `/multi-agent/reset`, `/multi-agent/step/a/{id}`, `/multi-agent/step/b/{id}`, `/multi-agent/state/{id}`.
|
| 78 |
-
|
| 79 |
-
### Procedural Incident Generator (`generator/incident_factory.py`)
|
| 80 |
-
- **Mechanics**: Takes a `seed` to build an incident dynamically. Supports 6 failure modes (`oom`, `cascade`, `corruption`, `security`, `database`, `network_partition`).
|
| 81 |
-
- **Noise injection**: Adds 0-3 random noise alerts (e.g., SSL renewals, scheduled batch jobs) to distract agents.
|
| 82 |
-
- **Difficulty Score Calculation**: `base_score + (noise_count * 0.05)`, clamped to 1.0.
|
| 83 |
-
|
| 84 |
-
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
| 85 |
-
SECTION 6: API ENDPOINTS
|
| 86 |
-
━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
| 87 |
-
| Method | Route | Description |
|
| 88 |
-
|---|---|---|
|
| 89 |
-
| `GET` | `/health` | Simple `{status: "ok"}` liveness check. |
|
| 90 |
-
| `GET` | `/tasks` | Lists all 8 tasks with descriptions and max steps. |
|
| 91 |
-
| `POST` | `/reset` | Initializes an episode. Body: `{"task_id": str, "seed": int}`. |
|
| 92 |
-
| `POST` | `/step` | Executes an `Action`. Returns a `StepResult`. |
|
| 93 |
-
| `GET` | `/state` | Full state dump including ground truth logic. |
|
| 94 |
-
| `GET` | `/validate` | Runs self-validation on the 7 core tasks via random-agent rollout. |
|
| 95 |
-
| `GET` | `/metrics` | Telemetry: Resolution rates, score averages per task. |
|
| 96 |
-
| `GET` | `/leaderboard` | Top 10 episodes ranked by score, then fewest steps. |
|
| 97 |
-
| `GET` | `/curriculum/*` | Next recommended task, status, and scaffolding hints. |
|
| 98 |
-
| `GET` | `/generate/preview` | Preview procedurally generated incident structure for a specific seed. |
|
| 99 |
-
|
| 100 |
-
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
| 101 |
-
SECTION 7: LIMITATIONS & KNOWN ISSUES (DEMO GOTCHAS)
|
| 102 |
-
━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
| 103 |
-
- **Validation Route Exception**: The `generated` task is intentionally excluded from the `/validate` endpoint (`VALID_TASKS` in `server/app.py`). This is because the random validation agent frequently fails its variable mechanics, leading to a false "failed" validation status.
|
| 104 |
-
- **State Model Seed Loss**: The integer `seed` is dropped from the finalised `State` Pydantic model response. It defaults to `42` in telemetry records (`track_episode`) if not explicitly extracted.
|
| 105 |
-
- **Entrypoint Synchronization**: The primary Hugging Face production entrypoint is `server/app.py`. Do NOT use `api.py` for live deployments. If 404 errors appear, ensure routes were ported to `app.py`.
|
| 106 |
-
- **Security Blocks**: Hugging Face repository tokens (`HF_TOKEN`) must be managed via Secrets. Hardcoded tokens in `devops.ipynb` were actively scrubbed to prevent automated security takedowns.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
uvicorn_err.txt
DELETED
|
File without changes
|
uvicorn_out.txt
DELETED
|
File without changes
|