Arijit-07 Claude Sonnet 4.6 commited on
Commit
230f8d5
·
1 Parent(s): 35cb0ad

final: submission cleanup — remove junk files, update README endpoints, clean .gitignore

Browse files

- Remove FINALS_STATUS.md, README_github.md, technical_reference.md
- Remove escape.py, inject_live.py (dev scripts)
- Remove uvicorn_err.txt, uvicorn_out.txt
- Add /live, /challenge, /progress, /replays, /replay/{id} to API table in README.md
- Rewrite .gitignore with clean UTF-8 encoding + new exclusion patterns

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Files changed (9) hide show
  1. .gitignore +0 -0
  2. FINALS_STATUS.md +0 -29
  3. README.md +6 -0
  4. README_github.md +0 -223
  5. escape.py +0 -35
  6. inject_live.py +0 -618
  7. technical_reference.md +0 -106
  8. uvicorn_err.txt +0 -0
  9. uvicorn_out.txt +0 -0
.gitignore CHANGED
Binary files a/.gitignore and b/.gitignore differ
 
FINALS_STATUS.md DELETED
@@ -1,29 +0,0 @@
1
- # DevOps Incident Response OpenEnv — Hackathon Finals Status Report
2
-
3
- ## System Readiness: 🟢 READY FOR FINALS (STABLE)
4
-
5
- This document serves as the final system state report following a comprehensive 10-point stress test and validation suite conducted on the environment ahead of the Meta hackathon finals.
6
-
7
- ### Validation Summary
8
-
9
- | Test Suite | Objective | Status | Notes |
10
- | :--- | :--- | :--- | :--- |
11
- | **TEST 1: Optimal End-to-End Validation** | Verify all 7 tasks resolve successfully via their optimal deterministic agent path | ✅ **PASSED** | Fixed task scoring on the `bonus` tier. All tasks now natively yield final scores well above the `0.70` threshold. |
12
- | **TEST 2: New Actions Efficacy** | Validate `BLOCK_IP_RANGE`, `CREATE_INDEX`, and `FAILOVER` mechanisms | ✅ **PASSED** | Actions reward positively on intended tasks. Added cross-task safety: attempting advanced domain actions randomly triggers strict `0.10` collateral penalties to prevent generalized hallucination exploits. |
13
- | **TEST 3: WebSocket Protocol** | Verify server compatibility with async client connections | ✅ **PASSED** | Verified connection payload streams using FastAPI WebSocket routing inside `server/app.py`. |
14
- | **TEST 4: Metrics / Leaderboard API** | Verify on-memory rolling cache metrics | ✅ **PASSED** | Effectively computes and routes full aggregated endpoints across all 7 tasks via `deque`. |
15
- | **TEST 5: Graceful Error Enforcement** | Validate invalid inputs return HTTP 400s | ✅ **PASSED** | Invalid JSON payloads or unknown action enums gracefully yield `422 Unprocessable Entity` rather than locking the `server` layer. |
16
- | **TEST 6: Runbook Validation** | Ensure all incidents match accompanying markdown | ✅ **PASSED** | Tested integration linking new actions to their respective diagnostic documentation correctly. |
17
- | **TEST 7: Cross-Seed Stability** | Execute 20x episodes using an unconstrained random agent | ✅ **PASSED** | Refactored randomization parameters to prevent static stalling. Tested gracefully across seeds with outputs safely scaling inside the strictly required `(0.0, 1.0)` domains. |
18
- | **TEST 8: Live HF Space Ping** | Ensure active remote deployment stability | ✅ **PASSED** | Space is alive. Verified HTTP 200 checks and validated successful endpoint load-ins natively over the web. |
19
- | **TEST 9: Docker Build Sandbox** | Deploy via closed isolated container layer | ➖ **SKIPPED** | Docker Daemon initialization unavailable on the host; tests executed fully at the Python module layer safely simulating equivalence. |
20
-
21
- ### Technical Bug Fixes Applied
22
-
23
- During testing, several backend elements were refactored for production-grade robustness:
24
- 1. **Collateral Penalty Injection:** Injected tight action validation scopes into every core task (`task_*.py`), preventing `FAILOVER` instructions from triggering on standard internal tasks and correctly returning `-0.10` penalty weights.
25
- 2. **Random Path Traversing:** Stopped random agents from artificially focusing on single services (`payment-service`), which skewed `grade_episode(..., 0.001)` limits and generated deterministic failure clusters during multi-seed attempts.
26
-
27
- The environment is strictly bound, accurately evaluates all 7 domains, properly exposes modern endpoints (`/ws`, `/metrics`, `/leaderboard`), and correctly penalizes stray agent anomalies.
28
-
29
- **Good luck at the finals!**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
README.md CHANGED
@@ -199,6 +199,12 @@ POST /multi-agent/step/b/{id} # {"action_type": "restart_service", ...}
199
  | POST | `/multi-agent/reset` | Start dual-agent session |
200
  | POST | `/multi-agent/step/a/{id}` | Agent A shares finding |
201
  | POST | `/multi-agent/step/b/{id}` | Agent B takes action |
 
 
 
 
 
 
202
  | GET | `/docs` | Swagger UI |
203
 
204
  ---
 
199
  | POST | `/multi-agent/reset` | Start dual-agent session |
200
  | POST | `/multi-agent/step/a/{id}` | Agent A shares finding |
201
  | POST | `/multi-agent/step/b/{id}` | Agent B takes action |
202
+ | GET | `/live` | Live NOC dashboard (real-time) |
203
+ | GET | `/challenge` | Human vs Agent challenge |
204
+ | GET | `/progress` | Score progression visualization |
205
+ | GET | `/replays` | Episode replay list |
206
+ | GET | `/replay/{id}` | Full episode replay |
207
+ | GET | `/replay/{id}/html` | Replay HTML viewer |
208
  | GET | `/docs` | Swagger UI |
209
 
210
  ---
README_github.md DELETED
@@ -1,223 +0,0 @@
1
- # ARIA — DevOps Incident Response
2
- ### *The first OpenEnv RL environment for production incident response*
3
-
4
- [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Twilight-13/devops-incident-response/blob/main/train_grpo.ipynb)
5
- [![HF Space](https://img.shields.io/badge/🤗-Live%20Environment-orange)](https://huggingface.co/spaces/Arijit-07/devops-incident-response)
6
- [![Trained Model](https://img.shields.io/badge/🤗-Llama--3.1--8B%20Fine--tuned-blue)](https://huggingface.co/Arijit-07/aria-devops-llama8b)
7
- [![License](https://img.shields.io/badge/License-Apache_2.0-green.svg)](LICENSE)
8
-
9
- > **ARIA** — Adaptive Reward & Incident Architecture
10
- > Built for the Meta × PyTorch × HuggingFace OpenEnv Hackathon Finals | Bangalore, April 2026
11
-
12
- ---
13
-
14
- ## 🔗 Quick Links for Judges
15
-
16
- | Resource | Link |
17
- |---|---|
18
- | **Live Environment** | https://arijit-07-devops-incident-response.hf.space |
19
- | **Interactive API** | https://arijit-07-devops-incident-response.hf.space/docs |
20
- | **Trained Model (8B)** | https://huggingface.co/Arijit-07/aria-devops-llama8b |
21
- | **Training Curve** | https://huggingface.co/Arijit-07/aria-devops-llama8b/resolve/main/training_curve_8b.png |
22
- | **Blog Post** | https://huggingface.co/blog/Arijit-07/aria-devops-incident-response |
23
- | **GitHub** | https://github.com/Twilight-13/devops-incident-response |
24
- | **Validate** | https://arijit-07-devops-incident-response.hf.space/validate |
25
- | **About (machine-readable)** | https://arijit-07-devops-incident-response.hf.space/about |
26
-
27
- ---
28
-
29
- ## ⚡ Run a Complete Episode Right Now
30
-
31
- ```bash
32
- # 1. Start an easy incident
33
- curl -X POST https://arijit-07-devops-incident-response.hf.space/reset \
34
- -H "Content-Type: application/json" \
35
- -d '{"task_id": "easy", "seed": 42}'
36
-
37
- # 2. Read logs on the failing service (reward: +0.15)
38
- curl -X POST https://arijit-07-devops-incident-response.hf.space/step \
39
- -H "Content-Type: application/json" \
40
- -d '{"action_type": "read_logs", "service": "payment-service"}'
41
-
42
- # 3. Diagnose (reward: +0.30)
43
- curl -X POST https://arijit-07-devops-incident-response.hf.space/step \
44
- -H "Content-Type: application/json" \
45
- -d '{"action_type": "diagnose", "root_cause": "memory leak in payment-service"}'
46
-
47
- # 4. Fix it (reward: +0.40)
48
- curl -X POST https://arijit-07-devops-incident-response.hf.space/step \
49
- -H "Content-Type: application/json" \
50
- -d '{"action_type": "restart_service", "service": "payment-service"}'
51
-
52
- # 5. Validate all 7 tasks pass
53
- curl https://arijit-07-devops-incident-response.hf.space/validate
54
- ```
55
-
56
- ---
57
-
58
- ## 🎯 The Problem
59
-
60
- Every company running microservices faces the same reality: **production incidents are expensive, stressful, and happen at 3am.**
61
-
62
- SWE-bench tests code generation. WebArena tests web navigation. Nothing trains agents to handle live production incidents — to read logs strategically, trace cascading failures, correlate subtle business anomalies, and apply precise fixes where wrong choices cause collateral damage.
63
-
64
- **ARIA fills that gap.**
65
-
66
- ---
67
-
68
- ## 🎬 The 7 Tasks
69
-
70
- | Task | Max Steps | Random | Strong LLM | Scenario |
71
- |---|---|---|---|---|
72
- | `easy` | 15 | 0.05 | 0.85–1.00 | Single service OOM crash-loop |
73
- | `medium` | 20 | 0.03 | 0.55–0.75 | Cascading failure + red herring alert |
74
- | `hard` | 25 | 0.01 | 0.30–0.50 | **Silent** corruption — all services green |
75
- | `bonus` | 25 | 0.01 | 0.35–0.55 | Two simultaneous independent failures |
76
- | `security` | 20 | 0.01 | 0.40–0.60 | DDoS botnet credential stuffing |
77
- | `database` | 20 | 0.01 | 0.45–0.65 | Missing index — full table scans |
78
- | `failover` | 25 | 0.01 | 0.35–0.55 | Multi-region network partition |
79
- | `generated` | 20 | 0.01 | variable | Procedural — seed-deterministic |
80
-
81
- ---
82
-
83
- ## 🏆 Reward Function
84
-
85
- ```
86
- Final Score = Σ(step_rewards)
87
- + efficiency_bonus # (1 - steps/max_steps) × 0.05
88
- + diagnosis_precision # +0.03 if ≥50% keyword overlap
89
- - noop_penalty # (noops - 3) × 0.02
90
- ```
91
-
92
- Clamped to **(0.001, 0.999)** for GRPO stability.
93
-
94
- | Action | Reward | Penalty Triggers |
95
- |---|---|---|
96
- | `read_logs` correct | +0.15 | Restart healthy service: **-0.15** |
97
- | `diagnose` full match | +0.35 | Fix without diagnosing: **-0.10** |
98
- | `restart_service` correct | +0.45 | Wrong failover (payment): **-0.25** |
99
- | `block_ip_range` | +0.40 | Excessive noops: **-0.04 each** |
100
- | `alert_oncall` (required) | +0.15 | |
101
-
102
- **Semantic matching:** keyword overlap not exact string — LLMs that paraphrase aren't penalized.
103
-
104
- ---
105
-
106
- ## 🌟 ARIA Features
107
-
108
- ### Curriculum Engine
109
- Rolling average per task (last 5 episodes). Promotes when avg > 0.75. Scaffolds with hints when avg < 0.30. Agents always train at the edge of their capability.
110
-
111
- ```bash
112
- GET /curriculum/status
113
- GET /curriculum/next
114
- POST /curriculum/record # {"task_id": "easy", "score": 0.85}
115
- ```
116
-
117
- ### Incident Generator
118
- Seeds 0–99,999 → unique reproducible incidents. 6 failure modes × 8 services × 3 severities × 0–3 noise alerts.
119
-
120
- ```bash
121
- GET /generate/preview?seed=1337
122
- POST /reset # {"task_id": "generated", "seed": 1337}
123
- ```
124
-
125
- ### Dual-Agent Mode
126
- Split observability. Agent A (Observer) sees logs and alerts. Agent B (Responder) sees metrics and dependencies. They coordinate via `share_finding`. Neither can solve the incident alone.
127
-
128
- ```bash
129
- POST /multi-agent/reset # {"task_id": "easy", "seed": 42}
130
- POST /multi-agent/step/a/{id} # {"finding": "order-service OOM"}
131
- POST /multi-agent/step/b/{id} # {"action_type": "restart_service", ...}
132
- ```
133
-
134
- ---
135
-
136
- ## 🧠 Training Results
137
-
138
- **Model:** [Arijit-07/aria-devops-llama8b](https://huggingface.co/Arijit-07/aria-devops-llama8b)
139
-
140
- | Task | Baseline | Fine-tuned | **Improvement** |
141
- |---|---|---|---|
142
- | easy | 0.320 | 0.685 | **+0.365** |
143
- | medium | 0.050 | 0.378 | **+0.328** |
144
- | hard | 0.190 | 0.869 | **+0.679** |
145
- | bonus | 0.152 | 0.682 | **+0.530** |
146
-
147
- ![Training Curve](https://huggingface.co/Arijit-07/aria-devops-llama8b/resolve/main/training_curve_8b.png)
148
-
149
- **Setup:** GRPO · Llama-3.1-8B · LoRA rank=32 · 160 episodes · NVIDIA L4 · 162 minutes · Unsloth + HuggingFace TRL
150
-
151
- **Key fix:** Group completions scored on fresh environment snapshots — prevents reward gate exhaustion during GRPO group generation.
152
-
153
- [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Twilight-13/devops-incident-response/blob/main/train_grpo.ipynb)
154
-
155
- ---
156
-
157
- ## 📡 API Reference
158
-
159
- | Method | Endpoint | Description |
160
- |---|---|---|
161
- | GET | `/health` | Liveness check |
162
- | GET | `/about` | Full machine-readable description |
163
- | GET | `/tasks` | All 8 tasks |
164
- | POST | `/reset` | Start episode |
165
- | POST | `/step` | Take action |
166
- | GET | `/state` | Full state + ground truth |
167
- | GET | `/validate` | Self-test all 7 tasks |
168
- | GET | `/metrics` | Aggregate statistics |
169
- | GET | `/leaderboard` | Top 10 episodes |
170
- | WS | `/ws` | WebSocket real-time |
171
- | GET | `/curriculum/status` | Per-task mastery |
172
- | GET | `/curriculum/next` | Recommended task |
173
- | POST | `/curriculum/record` | Feed training results |
174
- | GET | `/generate/preview` | Preview procedural incident |
175
- | POST | `/multi-agent/reset` | Start dual-agent session |
176
- | POST | `/multi-agent/step/a/{id}` | Agent A shares finding |
177
- | POST | `/multi-agent/step/b/{id}` | Agent B takes action |
178
- | GET | `/docs` | Swagger UI |
179
-
180
- ---
181
-
182
- ## 📊 Benchmark Comparison
183
-
184
- | Benchmark | Domain | Partial Obs | Dense Reward | Curriculum | Multi-Agent |
185
- |---|---|---|---|---|---|
186
- | SWE-bench | Code repair | ✗ | ✗ | ✗ | ✗ |
187
- | WebArena | Web navigation | ✓ | ✗ | ✗ | ✗ |
188
- | AgentBench | General tools | ✗ | ✗ | ✗ | ✗ |
189
- | **ARIA** | **Incident response** | **✓** | **✓** | **✓** | **✓** |
190
-
191
- ---
192
-
193
- ## 🚀 Setup
194
-
195
- ```bash
196
- docker build -t aria-devops-incident .
197
- docker run -p 7860:7860 aria-devops-incident
198
-
199
- # Or local
200
- pip install -r requirements.txt
201
- uvicorn api:app --host 0.0.0.0 --port 7860
202
- ```
203
-
204
- ---
205
-
206
- ## 📁 Structure
207
-
208
- ```
209
- ├── api.py / server/app.py # FastAPI — all endpoints
210
- ├── env.py # Environment dispatcher
211
- ├── models.py # Pydantic models
212
- ├── tasks/ # 7 tasks + generated
213
- ├── curriculum/engine.py # Adaptive difficulty
214
- ├── generator/ # Procedural incidents
215
- ├── multi_agent/session.py # Dual-agent mode
216
- ├── graders/grader.py # Deterministic grader
217
- ├── demo_llm.py # Live terminal demo
218
- ├── train_grpo.ipynb # Training notebook
219
- ├── BLOG.md # Project story
220
- └── openenv.yaml # OpenEnv manifest
221
- ```
222
-
223
- Apache 2.0 · *Built solo for the Meta × PyTorch × HuggingFace OpenEnv Hackathon Finals — Bangalore, April 2026*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
escape.py DELETED
@@ -1,35 +0,0 @@
1
- import sys
2
- import re
3
-
4
- with open("ui_test.py", "r", encoding="utf-8") as f:
5
- ui_html = f.read()
6
-
7
- # Remove the first line (`html_content = """`) and the last line (`"""`)
8
- ui_html = ui_html.replace('html_content = """', '')
9
- ui_html = ui_html[:-3] # remove last """
10
-
11
- # Escape curly braces
12
- ui_html = ui_html.replace('{', '{{').replace('}', '}}')
13
-
14
- with open("server/app.py", "r", encoding="utf-8") as f:
15
- app_content = f.read()
16
-
17
- # The function to replace is def dashboard() ... to </html>"""
18
- # Let's find def dashboard():
19
- start_idx = app_content.find("def dashboard():")
20
- end_idx = app_content.find("</html>\"\"\"", start_idx) + len("</html>\"\"\"")
21
-
22
- if start_idx == -1 or end_idx == -1:
23
- print("Could not find dashboard function in server/app.py")
24
- sys.exit(1)
25
-
26
- new_dashboard = f'''def dashboard():
27
- html = f"""{ui_html}"""
28
- return html'''
29
-
30
- new_content = app_content[:start_idx] + new_dashboard + app_content[end_idx:]
31
-
32
- with open("server/app.py", "w", encoding="utf-8") as f:
33
- f.write(new_content)
34
-
35
- print("Successfully replaced dashboard endpoint!")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
inject_live.py DELETED
@@ -1,618 +0,0 @@
1
- import sys
2
-
3
- html_content = r'''
4
- @app.get("/live", response_class=HTMLResponse)
5
- async def live_dashboard():
6
- html = f"""<!DOCTYPE html>
7
- <html lang="en">
8
- <head>
9
- <meta charset="UTF-8">
10
- <title>ARIA NOC LIVE</title>
11
- <link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;600;700&family=Share+Tech+Mono&display=swap" rel="stylesheet">
12
- <style>
13
- :root {{
14
- --void: #000000;
15
- --bg: #060914;
16
- --surface: #0a0f1e;
17
- --surface2: #0d1628;
18
- --border: #1a2744;
19
- --border-bright: #2a4080;
20
- --blue: #4d9fff;
21
- --blue-dim: #1a3a6e;
22
- --cyan: #00d4ff;
23
- --green: #00ff88;
24
- --green-dim: #003a1e;
25
- --yellow: #ffaa00;
26
- --yellow-dim: #3a2800;
27
- --red: #ff3355;
28
- --red-dim: #3a0011;
29
- --purple: #9d4edd;
30
- --text: #c8d8f0;
31
- --text-dim: #4a6080;
32
- --text-mono: #8ab4d4;
33
- }}
34
-
35
- * {{ box-sizing: border-box; margin: 0; padding: 0; }}
36
-
37
- body {{
38
- background-color: var(--bg);
39
- color: var(--text);
40
- font-family: 'Inter', sans-serif;
41
- overflow: hidden;
42
- height: 100vh;
43
- display: grid;
44
- grid-template-rows: 48px 1fr 56px;
45
- grid-template-columns: 28% 44% 28%;
46
- grid-template-areas:
47
- "top top top"
48
- "left center right"
49
- "bottom bottom bottom";
50
- }}
51
-
52
- .scanlines {{
53
- position: fixed;
54
- top: 0; left: 0; width: 100%; height: 100%;
55
- pointer-events: none;
56
- z-index: 9999;
57
- background: repeating-linear-gradient(
58
- 0deg,
59
- transparent,
60
- transparent 2px,
61
- rgba(0,0,0,0.03) 2px,
62
- rgba(0,0,0,0.03) 4px
63
- );
64
- }}
65
-
66
- .mono {{ font-family: 'Share Tech Mono', monospace; }}
67
- .uppercase {{ text-transform: uppercase; }}
68
-
69
- #top-bar {{
70
- grid-area: top;
71
- background: var(--void);
72
- border-bottom: 1px solid var(--border);
73
- display: flex;
74
- justify-content: space-between;
75
- align-items: center;
76
- padding: 0 16px;
77
- }}
78
-
79
- .top-left, .top-center, .top-right {{ display: flex; align-items: center; gap: 12px; }}
80
-
81
- .logo {{ font-size: 18px; color: var(--blue); font-weight: bold; }}
82
- .logo-sub {{ font-size: 10px; color: var(--text-dim); }}
83
- .separator {{ width: 1px; height: 24px; background: var(--border); }}
84
-
85
- .status-dot {{ width: 8px; height: 8px; border-radius: 50%; }}
86
- .dot-green {{ background: var(--red); animation: livePulse 1.5s infinite; }}
87
- .dot-grey {{ background: var(--text-dim); }}
88
-
89
- @keyframes livePulse {{
90
- 0% {{ opacity: 0; }}
91
- 50% {{ opacity: 1; }}
92
- 100% {{ opacity: 0; }}
93
- }}
94
-
95
- .control-label {{ font-size: 9px; color: var(--text-dim); }}
96
- .terminal-input {{
97
- background: var(--surface);
98
- border: 1px solid var(--border-bright);
99
- color: var(--blue);
100
- font-family: 'Share Tech Mono', monospace;
101
- padding: 4px 8px;
102
- outline: none;
103
- }}
104
- .btn-deploy {{
105
- background: var(--blue-dim);
106
- border: 1px solid var(--blue);
107
- color: var(--blue);
108
- font-family: 'Share Tech Mono', monospace;
109
- font-size: 11px;
110
- padding: 6px 16px;
111
- cursor: pointer;
112
- transition: 0.2s;
113
- }}
114
- .btn-deploy:hover {{ background: var(--blue); color: var(--void); }}
115
-
116
- .step-counter {{ font-size: 16px; color: var(--cyan); }}
117
- .score-display-small {{ font-size: 20px; font-weight: bold; }}
118
- .clock {{ font-size: 11px; color: var(--text-dim); }}
119
-
120
- .panel {{
121
- padding: 16px;
122
- display: flex;
123
- flex-direction: column;
124
- gap: 12px;
125
- overflow: hidden;
126
- }}
127
- #left-panel {{ grid-area: left; border-right: 1px solid var(--border); }}
128
- #center-panel {{ grid-area: center; border-right: 1px solid var(--border); }}
129
- #right-panel {{ grid-area: right; border-color: var(--purple); }}
130
-
131
- .panel-header {{
132
- display: flex; align-items: center; gap: 8px; font-size: 9px; color: var(--text-dim); margin-bottom: 8px;
133
- }}
134
- .pill {{ background: var(--surface2); padding: 2px 6px; border-radius: 10px; color: var(--text); }}
135
-
136
- #service-list {{ display: flex; flex-direction: column; gap: 8px; overflow-y: auto; flex: 1; }}
137
- .service-item {{
138
- height: 52px; padding: 0 12px; display: flex; justify-content: space-between; align-items: center; flex-shrink: 0; transition: border-color 0.3s, background 0.3s;
139
- }}
140
- .svc-name {{ font-size: 12px; color: var(--text); }}
141
- .svc-status {{ font-size: 9px; margin-top: 4px; }}
142
-
143
- .svc-stats {{ text-align: right; }}
144
- .svc-stat-line {{ font-size: 11px; }}
145
-
146
- @keyframes statusFlash {{
147
- 0% {{ border-color: var(--text); }}
148
- 100% {{ border-color: inherit; }}
149
- }}
150
- @keyframes criticalFlash {{
151
- 0%, 50%, 100% {{ border-color: var(--border); }}
152
- 25%, 75% {{ border-color: var(--red); }}
153
- }}
154
- .flash-critical {{ animation: criticalFlash 0.5s ease-in-out; border-color: var(--red) !important; }}
155
-
156
- @keyframes resolveFlash {{
157
- 0%, 50%, 100% {{ border-color: var(--border); }}
158
- 25%, 75% {{ border-color: var(--green); }}
159
- }}
160
- .flash-resolve {{ animation: resolveFlash 2s ease-in-out; border-color: var(--green) !important; }}
161
-
162
- @keyframes pulseScore {{
163
- 0% {{ transform: scale(1); }}
164
- 50% {{ transform: scale(1.1); }}
165
- 100% {{ transform: scale(1); }}
166
- }}
167
- .pulse-score {{ animation: pulseScore 2s ease-in-out; }}
168
-
169
- @keyframes slideInRight {{
170
- from {{ transform: translateX(20px); opacity: 0; }}
171
- to {{ transform: translateX(0); opacity: 1; }}
172
- }}
173
- @keyframes fadeIn {{
174
- from {{ opacity: 0; }}
175
- to {{ opacity: 1; }}
176
- }}
177
-
178
- .center-top {{ flex: 1; display: flex; flex-direction: column; overflow: hidden; }}
179
- .center-bottom {{ height: 200px; display: flex; flex-direction: column; justify-content: flex-end; }}
180
-
181
- #alerts-list {{ display: flex; flex-direction: column; gap: 8px; flex: 1; }}
182
- .alert-strip {{
183
- height: 36px; display: flex; align-items: center; gap: 8px; padding-right: 12px; animation: slideInRight 0.3s ease-out;
184
- }}
185
- .alert-badge {{
186
- height: 100%; padding: 0 8px; display: flex; align-items: center; font-size: 9px; font-weight: bold; color: #000;
187
- }}
188
- .alert-text {{ font-size: 11px; color: var(--text); white-space: nowrap; overflow: hidden; text-overflow: ellipsis; }}
189
- .no-alerts {{ text-align: center; color: var(--text-dim); margin-top: 40px; animation: livePulse 3s infinite; }}
190
-
191
- .giant-score {{ font-size: 48px; font-weight: bold; text-align: center; margin-bottom: 12px; text-shadow: 0 0 20px currentColor; }}
192
- .progress-container {{ width: 100%; height: 8px; background: var(--surface); margin-bottom: 8px; }}
193
- .progress-fill {{ height: 100%; background: linear-gradient(90deg, var(--blue), var(--green)); transition: width 0.5s ease; width: 0%; }}
194
- .score-stats {{ display: flex; justify-content: space-between; font-size: 10px; color: var(--text-dim); margin-bottom: 16px; }}
195
-
196
- .sparkline {{ display: flex; align-items: flex-end; gap: 4px; height: 40px; margin-top: auto; }}
197
- .spark-bar {{ width: 16px; background: var(--green); animation: slideInRight 0.2s ease-out; position: relative; }}
198
- .spark-label {{ position: absolute; bottom: -14px; left: 50%; transform: translateX(-50%); font-size: 8px; color: var(--text-dim); }}
199
-
200
- #agent-log {{
201
- flex: 1; overflow-y: auto; display: flex; flex-direction: column; gap: 4px;
202
- }}
203
- .log-entry {{ animation: fadeIn 0.2s ease-out; font-size: 11px; line-height: 1.4; }}
204
- .log-time {{ color: var(--text-dim); margin-right: 8px; }}
205
- .log-action {{ color: var(--purple); }}
206
- .log-reward {{ padding-left: 48px; }}
207
- .log-evidence {{ color: var(--text-dim); font-style: italic; padding-left: 48px; }}
208
- .log-diagnose {{ color: var(--yellow); }}
209
- .log-fix {{ color: var(--cyan); }}
210
- .log-episode-start {{ color: var(--cyan); text-align: center; margin: 8px 0; }}
211
- .log-episode-end-ok {{ color: var(--green); text-align: center; margin: 8px 0; }}
212
- .log-episode-end-fail {{ color: var(--red); text-align: center; margin: 8px 0; }}
213
-
214
- #bottom-bar {{
215
- grid-area: bottom; background: var(--void); border-top: 1px solid var(--border); display: flex; justify-content: space-between; align-items: center; padding: 0 16px;
216
- }}
217
- .ws-status {{ display: flex; align-items: center; gap: 8px; font-size: 11px; }}
218
- .tip-text {{ font-size: 11px; color: var(--text-dim); font-style: italic; transition: opacity 0.5s; }}
219
- .footer-right {{ font-size: 10px; color: var(--text-dim); }}
220
-
221
- ::-webkit-scrollbar {{ width: 4px; }}
222
- ::-webkit-scrollbar-track {{ background: transparent; }}
223
- ::-webkit-scrollbar-thumb {{ background: var(--border-bright); }}
224
- </style>
225
- </head>
226
- <body>
227
- <div class="scanlines"></div>
228
-
229
- <div id="top-bar">
230
- <div class="top-left">
231
- <div class="logo mono">▣ ARIA</div>
232
- <div class="logo-sub uppercase">Incident Response System</div>
233
- <div class="separator"></div>
234
- <div class="status-dot dot-grey" id="live-dot"></div>
235
- <div class="logo-sub mono" id="live-text" style="color: var(--text)">OFFLINE</div>
236
- </div>
237
-
238
- <div class="top-center">
239
- <div class="control-label uppercase">Active Scenario</div>
240
- <select class="terminal-input" id="task-select">
241
- <option value="easy">EASY</option>
242
- <option value="medium">MEDIUM</option>
243
- <option value="hard">HARD</option>
244
- <option value="bonus">BONUS</option>
245
- <option value="security">SECURITY</option>
246
- <option value="database">DATABASE</option>
247
- <option value="failover">FAILOVER</option>
248
- <option value="generated">GENERATED</option>
249
- </select>
250
- <div class="control-label uppercase">Seed:</div>
251
- <input type="number" class="terminal-input" id="seed-input" value="42" style="width: 70px;">
252
- <button class="btn-deploy" onclick="deployIncident()">▶ DEPLOY INCIDENT</button>
253
- </div>
254
-
255
- <div class="top-right">
256
- <div class="step-counter mono" id="top-step">00 / 15</div>
257
- <div class="separator"></div>
258
- <div class="score-display-small mono" id="top-score">0.000</div>
259
- <div class="separator"></div>
260
- <div class="clock mono" id="clock">00:00:00</div>
261
- </div>
262
- </div>
263
-
264
- <div id="left-panel" class="panel">
265
- <div class="panel-header uppercase">
266
- ◈ Infrastructure Status <span class="pill mono" id="svc-count">0</span>
267
- </div>
268
- <div id="service-list"></div>
269
- </div>
270
-
271
- <div id="center-panel" class="panel">
272
- <div class="center-top">
273
- <div class="panel-header uppercase">
274
- ◈ Active Alerts <span class="pill mono" id="alert-count" style="background:var(--surface2)">0</span>
275
- </div>
276
- <div id="alerts-list">
277
- <div class="no-alerts mono">◎ ALL SYSTEMS NOMINAL</div>
278
- </div>
279
- </div>
280
-
281
- <div class="center-bottom">
282
- <div class="panel-header uppercase">◈ Episode Metrics</div>
283
- <div class="giant-score mono" id="giant-score" style="color: var(--text-dim)">0.000</div>
284
- <div class="progress-container"><div class="progress-fill" id="score-bar"></div></div>
285
- <div class="score-stats mono uppercase">
286
- <span id="stat-step">STEP 0/15</span>
287
- <span id="stat-task">TASK: --</span>
288
- <span id="stat-seed">SEED: --</span>
289
- </div>
290
- <div class="sparkline" id="sparkline"></div>
291
- </div>
292
- </div>
293
-
294
- <div id="right-panel" class="panel">
295
- <div class="panel-header uppercase" style="color: var(--purple)">◈ Agent Reasoning</div>
296
- <div id="agent-log" class="mono"></div>
297
- </div>
298
-
299
- <div id="bottom-bar">
300
- <div class="ws-status mono">
301
- <div class="status-dot dot-grey" id="btm-dot"></div>
302
- <span id="btm-text" style="color: var(--text-dim)">○ WS DISCONNECTED</span>
303
- </div>
304
- <div class="tip-text" id="tip-text">ⓘ Agents must read_logs before acting — blind remediation triggers -0.10 penalty</div>
305
- <div class="footer-right mono">ARIA v2.0 · OpenEnv Compliant &nbsp;&nbsp; 🤗 Arijit-07</div>
306
- </div>
307
-
308
- <script>
309
- const TIPS = [
310
- "ⓘ Agents must read_logs before acting — blind remediation triggers -0.10 penalty",
311
- "ⓘ Collateral damage: restarting healthy services costs -0.15",
312
- "ⓘ 7 tasks · 14 actions · Dense reward shaping · Semantic diagnosis matching",
313
- "ⓘ Curriculum Engine adapts difficulty to agent performance",
314
- "ⓘ Dual-Agent Mode: Observer sees logs, Responder sees metrics",
315
- "ⓘ Grader clamped to (0.001, 0.999) for GRPO advantage stability",
316
- "ⓘ Hard task: all services green — signal buried in business metrics"
317
- ];
318
- let tipIdx = 0;
319
- setInterval(() => {{
320
- const el = document.getElementById('tip-text');
321
- el.style.opacity = 0;
322
- setTimeout(() => {{
323
- tipIdx = (tipIdx + 1) % TIPS.length;
324
- el.textContent = TIPS[tipIdx];
325
- el.style.opacity = 1;
326
- }}, 500);
327
- }}, 15000);
328
-
329
- setInterval(() => {{
330
- const now = new Date();
331
- document.getElementById('clock').textContent = now.toTimeString().split(' ')[0];
332
- }}, 1000);
333
-
334
- let ws = null;
335
- let currentTask = 'easy';
336
- let currentSeed = 42;
337
- let stepCount = 0;
338
- let totalScore = 0;
339
- let isRunning = false;
340
- let rewardHistory = [];
341
-
342
- function getScoreColor(sc) {{
343
- if(sc < 0.3) return 'var(--red)';
344
- if(sc < 0.6) return 'var(--yellow)';
345
- return 'var(--green)';
346
- }}
347
-
348
- function updateScoreDisplay() {{
349
- const sc = Math.max(0, totalScore);
350
- const col = getScoreColor(sc);
351
-
352
- const ts = document.getElementById('top-score');
353
- ts.textContent = sc.toFixed(3);
354
- ts.style.color = col;
355
-
356
- const gs = document.getElementById('giant-score');
357
- gs.textContent = sc.toFixed(3);
358
- gs.style.color = col;
359
-
360
- document.getElementById('score-bar').style.width = Math.min(100, sc * 100) + '%';
361
- }}
362
-
363
- function updateStepCounter() {{
364
- document.getElementById('top-step').textContent = `${{stepCount.toString().padStart(2,'0')}} / 15`;
365
- document.getElementById('stat-step').textContent = `STEP ${{stepCount}}/15`;
366
- }}
367
-
368
- function addLog(type, arg1, arg2) {{
369
- const logEl = document.getElementById('agent-log');
370
- const div = document.createElement('div');
371
- div.className = 'log-entry';
372
-
373
- const timeStr = new Date().toTimeString().split(' ')[0];
374
- const timeSpan = `<span class="log-time">[${{timeStr}}]</span>`;
375
-
376
- if (type === 'SYSTEM') {{
377
- div.innerHTML = `${{timeSpan}} <span style="color:var(--text-dim)">${{arg1}}</span>`;
378
- }} else if (type === 'EPISODE_START') {{
379
- div.innerHTML = `<div class="log-episode-start">━━━ NEW INCIDENT DEPLOYED ━━━<br>Task: ${{arg1.toUpperCase()}} | Seed: ${{arg2}}</div>`;
380
- }} else if (type === 'ACTION') {{
381
- div.innerHTML = `${{timeSpan}} <span class="log-action">→ ${{arg1.action_type}} ${{arg1.service || ''}}</span>`;
382
- }} else if (type === 'REWARD') {{
383
- let col = arg1 > 0 ? 'var(--green)' : (arg1 === 0 ? 'var(--red)' : 'var(--text-dim)');
384
- div.innerHTML = `<div class="log-reward" style="color:${{col}}">✦ ${{arg1 > 0 ? '+' : ''}}${{arg1.toFixed(3)}} reward</div>`;
385
- }} else if (type === 'EVIDENCE') {{
386
- let txt = (arg1 || '').substring(0, 60);
387
- if(arg1 && arg1.length > 60) txt += '...';
388
- div.innerHTML = `<div class="log-evidence">↳ ${{txt}}</div>`;
389
- }} else if (type === 'DIAGNOSE') {{
390
- div.innerHTML = `${{timeSpan}} <span class="log-diagnose">⊕ DIAGNOSIS: ${{arg1}}</span>`;
391
- }} else if (type === 'FIX') {{
392
- div.innerHTML = `${{timeSpan}} <span class="log-fix">⚡ FIX APPLIED: ${{arg1}} → ${{arg2}}</span>`;
393
- }} else if (type === 'EPISODE_END') {{
394
- if (arg1 >= 0.7) {{
395
- div.innerHTML = `<div class="log-episode-end-ok">━━━ ✓ INCIDENT RESOLVED ━━━<br>Score: ${{arg1.toFixed(3)}} | Steps: ${{arg2}}/15<br>━━━━━━━━━━━━━━━━━━━━━━━━━━━</div>`;
396
- document.getElementById('center-panel').classList.add('flash-resolve');
397
- document.getElementById('giant-score').classList.add('pulse-score');
398
- setTimeout(()=>{{
399
- document.getElementById('center-panel').classList.remove('flash-resolve');
400
- document.getElementById('giant-score').classList.remove('pulse-score');
401
- }}, 2000);
402
- }} else {{
403
- div.innerHTML = `<div class="log-episode-end-fail">━━━ ✗ INCIDENT ESCALATED ━━━<br>Score: ${{arg1.toFixed(3)}} | Steps: ${{arg2}}/15<br>━━━━━━━━━━━━━━━━━━━━━━━━━━━</div>`;
404
- }}
405
- }}
406
-
407
- logEl.appendChild(div);
408
- if(logEl.children.length > 200) logEl.removeChild(logEl.firstChild);
409
- logEl.scrollTop = logEl.scrollHeight;
410
- }}
411
-
412
- function updateSparkline() {{
413
- const sp = document.getElementById('sparkline');
414
- sp.innerHTML = '';
415
- const start = Math.max(0, rewardHistory.length - 12);
416
- const recent = rewardHistory.slice(start);
417
-
418
- recent.forEach((r, i) => {{
419
- const h = Math.max(2, Math.min(40, (r / 0.5) * 40));
420
- const col = r > 0 ? 'var(--green)' : 'var(--red)';
421
- sp.innerHTML += `<div class="spark-bar" style="height:${{h}}px; background:${{col}}"><div class="spark-label">${{start + i + 1}}</div></div>`;
422
- }});
423
- }}
424
-
425
- function startEpisode(task, seed) {{
426
- stepCount = 0;
427
- totalScore = 0;
428
- rewardHistory = [];
429
- isRunning = true;
430
- currentTask = task;
431
- currentSeed = seed;
432
-
433
- document.getElementById('stat-task').textContent = `TASK: ${{task.toUpperCase()}}`;
434
- document.getElementById('stat-seed').textContent = `SEED: ${{seed}}`;
435
- updateStepCounter();
436
- updateScoreDisplay();
437
- updateSparkline();
438
- document.getElementById('alerts-list').innerHTML = '<div class="no-alerts mono">◎ ALL SYSTEMS NOMINAL</div>';
439
- document.getElementById('alert-count').textContent = '0';
440
- document.getElementById('alert-count').style.background = 'var(--surface2)';
441
-
442
- addLog('EPISODE_START', task, seed);
443
- if(ws && ws.readyState === WebSocket.OPEN) {{
444
- ws.send(JSON.stringify({{command: "reset", task_id: task, seed: seed}}));
445
- }}
446
- }}
447
-
448
- function deployIncident() {{
449
- const task = document.getElementById('task-select').value;
450
- const seed = parseInt(document.getElementById('seed-input').value) || 42;
451
- if(ws && ws.readyState === WebSocket.OPEN) {{
452
- startEpisode(task, seed);
453
- }} else {{
454
- connectWS();
455
- }}
456
- }}
457
-
458
- function connectWS() {{
459
- if(ws) ws.close();
460
- const protocol = window.location.protocol === 'https:' ? 'wss:' : 'ws:';
461
- const wsUrl = `${{protocol}}//${{window.location.host}}/ws`;
462
-
463
- ws = new WebSocket(wsUrl);
464
-
465
- ws.onopen = () => {{
466
- document.getElementById('live-dot').className = 'status-dot dot-green';
467
- document.getElementById('live-text').textContent = 'LIVE';
468
- document.getElementById('live-text').style.color = 'var(--red)';
469
- document.getElementById('btm-dot').className = 'status-dot dot-green';
470
- document.getElementById('btm-text').textContent = '◉ WS CONNECTED';
471
- document.getElementById('btm-text').style.color = 'var(--green)';
472
- addLog('SYSTEM', 'WebSocket connected');
473
- startEpisode(currentTask, currentSeed);
474
- }};
475
-
476
- ws.onclose = () => {{
477
- document.getElementById('live-dot').className = 'status-dot dot-grey';
478
- document.getElementById('live-text').textContent = 'OFFLINE';
479
- document.getElementById('live-text').style.color = 'var(--text)';
480
- document.getElementById('btm-dot').className = 'status-dot dot-grey';
481
- document.getElementById('btm-text').textContent = '○ WS DISCONNECTED';
482
- document.getElementById('btm-text').style.color = 'var(--text-dim)';
483
- addLog('SYSTEM', 'Disconnected — reconnecting in 3s...');
484
- setTimeout(connectWS, 3000);
485
- }};
486
-
487
- ws.onmessage = (event) => {{
488
- let data;
489
- try {{ data = JSON.parse(event.data); }} catch(e) {{ return; }}
490
-
491
- if(data.services) {{
492
- const svcs = Object.entries(data.services).map(([name, s]) => ({{name, ...s}}));
493
- svcs.sort((a, b) => {{
494
- const val = (st) => st === 'down' ? 0 : (st === 'degraded' ? 1 : 2);
495
- return val(a.status) - val(b.status);
496
- }});
497
-
498
- const list = document.getElementById('service-list');
499
- list.innerHTML = '';
500
- document.getElementById('svc-count').textContent = svcs.length;
501
-
502
- svcs.forEach(s => {{
503
- let bcol = 'var(--border)', bgcol = 'var(--surface)', tcol = 'var(--text-dim)', stxt = '○ UNKNOWN';
504
- if(s.status === 'down') {{ bcol = 'var(--red)'; bgcol = 'var(--red-dim)'; tcol = 'var(--red)'; stxt = '● DOWN'; }}
505
- else if(s.status === 'degraded') {{ bcol = 'var(--yellow)'; bgcol = 'var(--yellow-dim)'; tcol = 'var(--yellow)'; stxt = '◐ DEGRADED'; }}
506
- else if(s.status === 'healthy') {{ bcol = 'var(--green)'; bgcol = 'var(--green-dim)'; tcol = 'var(--green)'; stxt = '○ HEALTHY'; }}
507
-
508
- let errRate = (s.error_rate * 100).toFixed(1);
509
- let memUtil = (s.memory_utilization * 100).toFixed(1);
510
- let errCol = s.error_rate > 0.3 ? 'var(--red)' : (s.error_rate > 0.1 ? 'var(--yellow)' : 'var(--green)');
511
- let memCol = s.memory_utilization > 0.9 ? 'var(--red)' : (s.memory_utilization > 0.7 ? 'var(--yellow)' : 'var(--green)');
512
-
513
- list.innerHTML += `
514
- <div class="service-item mono" style="border-left: 3px solid ${{bcol}}; background: ${{bgcol}}">
515
- <div>
516
- <div class="svc-name">${{s.name}}</div>
517
- <div class="svc-status" style="color:${{tcol}}">${{stxt}}</div>
518
- </div>
519
- <div class="svc-stats">
520
- <div class="svc-stat-line" style="color:${{errCol}}">ERR ${{errRate}}%</div>
521
- <div class="svc-stat-line" style="color:${{memCol}}">MEM ${{memUtil}}%</div>
522
- </div>
523
- </div>
524
- `;
525
- }});
526
- }}
527
-
528
- if(data.active_alerts) {{
529
- const alist = document.getElementById('alerts-list');
530
- alist.innerHTML = '';
531
- document.getElementById('alert-count').textContent = data.active_alerts.length;
532
- document.getElementById('alert-count').style.background = data.active_alerts.length > 0 ? 'var(--red)' : 'var(--surface2)';
533
-
534
- if(data.active_alerts.length === 0) {{
535
- alist.innerHTML = '<div class="no-alerts mono">◎ ALL SYSTEMS NOMINAL</div>';
536
- }} else {{
537
- let critFound = false;
538
- data.active_alerts.slice(0, 5).forEach(a => {{
539
- let bg = 'var(--surface)', border = 'var(--border)', txtCol = '#000';
540
- if(a.severity === 'CRITICAL') {{ border = 'var(--red)'; bg = 'var(--red)'; critFound = true; }}
541
- else if(a.severity === 'HIGH') {{ border = '#ff6600'; bg = '#ff6600'; }}
542
- else if(a.severity === 'WARNING') {{ border = 'var(--yellow)'; bg = 'var(--yellow)'; }}
543
- else {{ border = 'var(--blue)'; bg = 'var(--blue)'; }}
544
-
545
- alist.innerHTML += `
546
- <div class="alert-strip mono" style="border-left: 3px solid ${{border}}; background: ${{bg}}20">
547
- <div class="alert-badge" style="background:${{bg}}; color:${{txtCol}}">${{a.severity}}</div>
548
- <div class="alert-text">[${{a.service}}] ${{a.message}}</div>
549
- </div>
550
- `;
551
- }});
552
- if(data.active_alerts.length > 5) {{
553
- alist.innerHTML += `<div class="mono" style="font-size:9px; color:var(--text-dim); text-align:center">+${{data.active_alerts.length - 5}} more</div>`;
554
- }}
555
- if(critFound) {{
556
- const lp = document.getElementById('left-panel');
557
- lp.classList.remove('flash-critical');
558
- void lp.offsetWidth;
559
- lp.classList.add('flash-critical');
560
- }}
561
- }}
562
- }}
563
-
564
- if(data.action !== undefined && isRunning) {{
565
- stepCount++;
566
- updateStepCounter();
567
-
568
- let act = data.action;
569
- if(typeof act === 'string') try{{ act = JSON.parse(act) }}catch(e){{}}
570
-
571
- if(act.action_type === 'diagnose') addLog('DIAGNOSE', act.root_cause);
572
- else if(act.action_type === 'restart_service' || act.action_type === 'rollback_service' || act.action_type === 'block_ip')
573
- addLog('FIX', act.action_type, act.service || act.ip);
574
- else addLog('ACTION', act);
575
-
576
- if(data.evidence) addLog('EVIDENCE', data.evidence);
577
- if(data.reward !== undefined) {{
578
- totalScore += data.reward;
579
- rewardHistory.push(data.reward);
580
- addLog('REWARD', data.reward);
581
- updateScoreDisplay();
582
- updateSparkline();
583
- }}
584
-
585
- if(data.done) {{
586
- isRunning = false;
587
- addLog('EPISODE_END', totalScore, stepCount);
588
- updateScoreDisplay();
589
- setTimeout(() => {{
590
- currentSeed = Math.floor(Math.random() * 99999);
591
- document.getElementById('seed-input').value = currentSeed;
592
- startEpisode(currentTask, currentSeed);
593
- }}, 4000);
594
- }}
595
- }}
596
- }};
597
- }}
598
-
599
- window.onload = connectWS;
600
- </script>
601
- </body>
602
- </html>
603
- """
604
- return HTMLResponse(html)
605
- '''
606
-
607
- with open("server/app.py", "r", encoding="utf-8") as f:
608
- content = f.read()
609
-
610
- # find @app.get("/", response_class=HTMLResponse)
611
- target = '@app.get("/", response_class=HTMLResponse)'
612
- if target in content:
613
- new_content = content.replace(target, html_content + "\n\n" + target)
614
- with open("server/app.py", "w", encoding="utf-8") as f:
615
- f.write(new_content)
616
- print("SUCCESS")
617
- else:
618
- print("NOT FOUND")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
technical_reference.md DELETED
@@ -1,106 +0,0 @@
1
- # ARIA: DevOps Incident Response – Technical Reference Manual
2
-
3
- ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
4
- SECTION 1: PROJECT OVERVIEW
5
- ━━━━━━━━━━━━━━━━━━━━━━━━━━━
6
- **Project**: ARIA (DevOps Incident Response)
7
- **Purpose**: An OpenEnv-compliant RL environment where AI agents diagnose and remediate production software incidents across a simulated microservices architecture. Designed for the Meta × PyTorch × HuggingFace OpenEnv Hackathon finals.
8
-
9
- **Architecture Stack**:
10
- - **Framework**: FastAPI (Python)
11
- - **State Management**: In-memory `DevOpsEnvironment`. Websocket support available for real-time streaming.
12
- - **Core Data Models**: Pydantic (`Action`, `Observation`, `State`, `StepResult`)
13
- - **Protocol**: JSON over REST (`/reset`, `/step`, `/state`, `/validate`, `/multi-agent/*`, `/curriculum/*`)
14
- - **Deployment**: Hugging Face Spaces (`server/app.py` is the verified production entrypoint).
15
-
16
- ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
17
- SECTION 2: CORE ENVIRONMENT & ACTION SPACE
18
- ━━━━━━━━━━━━━━━━━━━━━━━━━━━
19
- The environment supports standard API and agent interactions. SLA degradation kicks in every step (if a service is `down`, error rates creep up; if `degraded`, latency increases).
20
-
21
- **Action Types & Side Effects**:
22
- - `read_logs(service)`: Returns the last 2 lines + a summary of hidden lines.
23
- - `search_logs(service, query)`: Case-insensitive search on logs.
24
- - `read_metrics(service)`: Returns CPU, memory, error rate, p99 latency, replicas, version, and SLA breach info.
25
- - `read_runbook(runbook)`: Loads Markdown text from the `data/runbooks/` directory.
26
- - `acknowledge(service)`: Acknowledges an active alert ID.
27
- - `diagnose(root_cause)`: Evaluates keyword overlap for a diagnosis bonus reward.
28
- - `restart_service(service)`: Fixes OOMs. Penalised if used on stateful services, unaffected services, or repeated.
29
- - `scale_up(service)`: Increases replicas. Penalised if data corruption or unrelated.
30
- - `rollback(service, version)`: Reverts a bad deployment to fix cascading failures.
31
- - `alert_oncall(reason)`: Required for cross-team fixes (data audit, security escalation, DBA intervention).
32
- - `block_ip_range(ip_range)`: Security response mechanism for DDoS attacks.
33
- - `create_index(table, column)`: DBA response for slow database queries.
34
- - `failover(target_region)`: Fails over eligible stateless services to `us-west-2`.
35
- - `noop()`: Take no action. Penalised if used excessively (> 3 times).
36
-
37
- ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
38
- SECTION 3: THE 7 CORE TASKS (PLUS GENERATED)
39
- ━━━━━━━━━━━━━━━━━━━━━━━━━━━
40
-
41
- | Task ID | Max Steps | Description & Difficulty | Ground Truth Root Cause | Ground Truth Fix |
42
- |---|---|---|---|---|
43
- | `easy` | 15 | **Single Service OOM**: One service crash-loops from a memory leak. | `memory_leak_{service}` | `restart {service}` |
44
- | `medium` | 20 | **Cascading Failure**: Bad deployment exhausts connection pools, cascading. Includes a red-herring alert. | `connection_pool_exhaustion` or `null_pointer` | `rollback {service}` |
45
- | `hard` | 25 | **Silent Data Corruption**: All services green. No error alerts. Requires correlating subtle business metrics. | `data_corruption_data_pipeline...` | `rollback data-pipeline` AND `alert_oncall` |
46
- | `bonus` | 25 | **Dual Simultaneous Failure**: Two independent failures at once. Both must be fixed. | `disk_full_log... AND model_reload_loop...` | `alert_oncall` (disk) AND `rollback` (ml) |
47
- | `security` | 20 | **DDoS Attack**: Botnet credential stuffing. Requires blocking CIDR and escalation. | `ddos_attack_185.x.x.x...` | `block_ip_range` AND `alert_oncall` |
48
- | `database` | 20 | **DB Degradation**: Missing schema index causing full table scans. | `missing_index_orders_user_segment...` | `create_index` or `rollback` |
49
- | `failover` | 25 | **Multi-Region Failover**: Network partition. Fails over stateless services. | `us_east_1_network_partition...` | `failover` eligible AND `alert_oncall` others |
50
- | `generated` | 20 | **Procedural Incident**: A seed-based deterministic incident generated by ARIA. | (Deterministic via Seed) | (Varies by failure mode) |
51
-
52
- ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
53
- SECTION 4: REWARD SHAPING & GRADING LOGIC
54
- ━━━━━━━━━━━━━━━━━━━━━━━━━━━
55
- The evaluation logic (`graders/grader.py`) calculates the final episode score, strictly bounded to `[0.001, 0.999]` to pass OpenEnv validation checks.
56
-
57
- - **Base Score**: The accumulated `total_reward` from individual step functions (clamped to `[0.0, 1.0]`).
58
- - **Efficiency Bonus**: If the incident is resolved, `+ (1.0 - (steps_taken / max_steps)) * 0.05`.
59
- - **Diagnosis Precision Bonus**: Checks keyword overlap of the `diagnose` action against the ground truth. `>= 50%` overlap adds `+0.03`. `>= 30%` overlap adds `+0.01`.
60
- - **Noop Penalty**: `(noop_count - 3) * 0.02` for excessive `noop` actions.
61
- - **Restart Penalty**: `(restarts - 1) * 0.05` per service restarted more than once (discourages guess-and-check).
62
- - **Blind Remediation Penalty** (`tasks/base.py`): `-0.05` applied locally in step functions if a fix action is taken before any `diagnose_correct` or `diagnose_partial` is awarded.
63
-
64
- ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
65
- SECTION 5: ADVANCED SUB-SYSTEMS
66
- ━━━━━━━━━━━━━━━━━━━━━━━━━━━
67
- ### Curriculum Engine (`curriculum/engine.py`)
68
- - **Mastery Tracker**: Keeps a rolling average of the last 5 scores per task.
69
- - **Mastery Levels**: Novice (0) ➔ Intermediate (1) ➔ Advanced (2) ➔ Mastered (3).
70
- - **Thresholds**: Rolling Avg `> 0.75` promotes mastery. `< 0.30` demotes mastery.
71
- - **Scaffolding**: Provides specific hints if a task is failed `>= 3` times and avg is `< 0.30`.
72
- - **Next Task Logic**: Returns the task with the lowest rolling average among non-mastered candidates.
73
-
74
- ### Multi-Agent Dual Session (`multi_agent/session.py`)
75
- - **Agent A (Observer)**: Read-only. Prompted to review logs and alerts. Must use `share_finding` to pass observations.
76
- - **Agent B (Responder)**: Write-only. Relies on Agent A's findings plus service metrics to execute remediation actions.
77
- - **Endpoints**: `/multi-agent/reset`, `/multi-agent/step/a/{id}`, `/multi-agent/step/b/{id}`, `/multi-agent/state/{id}`.
78
-
79
- ### Procedural Incident Generator (`generator/incident_factory.py`)
80
- - **Mechanics**: Takes a `seed` to build an incident dynamically. Supports 6 failure modes (`oom`, `cascade`, `corruption`, `security`, `database`, `network_partition`).
81
- - **Noise injection**: Adds 0-3 random noise alerts (e.g., SSL renewals, scheduled batch jobs) to distract agents.
82
- - **Difficulty Score Calculation**: `base_score + (noise_count * 0.05)`, clamped to 1.0.
83
-
84
- ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
85
- SECTION 6: API ENDPOINTS
86
- ━━━━━━━━━━━━━━━━━━━━━━━━━━━
87
- | Method | Route | Description |
88
- |---|---|---|
89
- | `GET` | `/health` | Simple `{status: "ok"}` liveness check. |
90
- | `GET` | `/tasks` | Lists all 8 tasks with descriptions and max steps. |
91
- | `POST` | `/reset` | Initializes an episode. Body: `{"task_id": str, "seed": int}`. |
92
- | `POST` | `/step` | Executes an `Action`. Returns a `StepResult`. |
93
- | `GET` | `/state` | Full state dump including ground truth logic. |
94
- | `GET` | `/validate` | Runs self-validation on the 7 core tasks via random-agent rollout. |
95
- | `GET` | `/metrics` | Telemetry: Resolution rates, score averages per task. |
96
- | `GET` | `/leaderboard` | Top 10 episodes ranked by score, then fewest steps. |
97
- | `GET` | `/curriculum/*` | Next recommended task, status, and scaffolding hints. |
98
- | `GET` | `/generate/preview` | Preview procedurally generated incident structure for a specific seed. |
99
-
100
- ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
101
- SECTION 7: LIMITATIONS & KNOWN ISSUES (DEMO GOTCHAS)
102
- ━━━━━━━━━━━━━━━━━━━━━━━━━━━
103
- - **Validation Route Exception**: The `generated` task is intentionally excluded from the `/validate` endpoint (`VALID_TASKS` in `server/app.py`). This is because the random validation agent frequently fails its variable mechanics, leading to a false "failed" validation status.
104
- - **State Model Seed Loss**: The integer `seed` is dropped from the finalised `State` Pydantic model response. It defaults to `42` in telemetry records (`track_episode`) if not explicitly extracted.
105
- - **Entrypoint Synchronization**: The primary Hugging Face production entrypoint is `server/app.py`. Do NOT use `api.py` for live deployments. If 404 errors appear, ensure routes were ported to `app.py`.
106
- - **Security Blocks**: Hugging Face repository tokens (`HF_TOKEN`) must be managed via Secrets. Hardcoded tokens in `devops.ipynb` were actively scrubbed to prevent automated security takedowns.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
uvicorn_err.txt DELETED
File without changes
uvicorn_out.txt DELETED
File without changes