Spaces:

Arijit-07
/

devops-incident-response

Running

Arijit-07 Claude Sonnet 4.6 commited on 29 days ago

Commit

230f8d5

1 Parent(s): 35cb0ad

final: submission cleanup — remove junk files, update README endpoints, clean .gitignore

- Remove FINALS_STATUS.md, README_github.md, technical_reference.md
- Remove escape.py, inject_live.py (dev scripts)
- Remove uvicorn_err.txt, uvicorn_out.txt
- Add /live, /challenge, /progress, /replays, /replay/{id} to API table in README.md
- Rewrite .gitignore with clean UTF-8 encoding + new exclusion patterns

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Files changed (9) hide show

.gitignore +0 -0
FINALS_STATUS.md +0 -29
README.md +6 -0
README_github.md +0 -223
escape.py +0 -35
inject_live.py +0 -618
technical_reference.md +0 -106
uvicorn_err.txt +0 -0
uvicorn_out.txt +0 -0

.gitignore CHANGED Viewed

Binary files a/.gitignore and b/.gitignore differ

FINALS_STATUS.md DELETED Viewed

@@ -1,29 +0,0 @@
-# DevOps Incident Response OpenEnv — Hackathon Finals Status Report
-## System Readiness: 🟢 READY FOR FINALS (STABLE)
-This document serves as the final system state report following a comprehensive 10-point stress test and validation suite conducted on the environment ahead of the Meta hackathon finals.
-### Validation Summary
-| Test Suite | Objective | Status | Notes |
-| :--- | :--- | :--- | :--- |
-| **TEST 1: Optimal End-to-End Validation** | Verify all 7 tasks resolve successfully via their optimal deterministic agent path | ✅ **PASSED** | Fixed task scoring on the `bonus` tier. All tasks now natively yield final scores well above the `0.70` threshold. |
-| **TEST 2: New Actions Efficacy** | Validate `BLOCK_IP_RANGE`, `CREATE_INDEX`, and `FAILOVER` mechanisms | ✅ **PASSED** | Actions reward positively on intended tasks. Added cross-task safety: attempting advanced domain actions randomly triggers strict `0.10` collateral penalties to prevent generalized hallucination exploits. |
-| **TEST 3: WebSocket Protocol** | Verify server compatibility with async client connections | ✅ **PASSED** | Verified connection payload streams using FastAPI WebSocket routing inside `server/app.py`. |
-| **TEST 4: Metrics / Leaderboard API** | Verify on-memory rolling cache metrics | ✅ **PASSED** | Effectively computes and routes full aggregated endpoints across all 7 tasks via `deque`. |
-| **TEST 5: Graceful Error Enforcement** | Validate invalid inputs return HTTP 400s | ✅ **PASSED** | Invalid JSON payloads or unknown action enums gracefully yield `422 Unprocessable Entity` rather than locking the `server` layer. |
-| **TEST 6: Runbook Validation** | Ensure all incidents match accompanying markdown | ✅ **PASSED** | Tested integration linking new actions to their respective diagnostic documentation correctly. |
-| **TEST 7: Cross-Seed Stability** | Execute 20x episodes using an unconstrained random agent | ✅ **PASSED** | Refactored randomization parameters to prevent static stalling. Tested gracefully across seeds with outputs safely scaling inside the strictly required `(0.0, 1.0)` domains. |
-| **TEST 8: Live HF Space Ping** | Ensure active remote deployment stability | ✅ **PASSED** | Space is alive. Verified HTTP 200 checks and validated successful endpoint load-ins natively over the web. |
-| **TEST 9: Docker Build Sandbox** | Deploy via closed isolated container layer | ➖ **SKIPPED** | Docker Daemon initialization unavailable on the host; tests executed fully at the Python module layer safely simulating equivalence. |
-### Technical Bug Fixes Applied
-During testing, several backend elements were refactored for production-grade robustness:
-1. **Collateral Penalty Injection:** Injected tight action validation scopes into every core task (`task_*.py`), preventing `FAILOVER` instructions from triggering on standard internal tasks and correctly returning `-0.10` penalty weights.
-2. **Random Path Traversing:** Stopped random agents from artificially focusing on single services (`payment-service`), which skewed `grade_episode(..., 0.001)` limits and generated deterministic failure clusters during multi-seed attempts.
-The environment is strictly bound, accurately evaluates all 7 domains, properly exposes modern endpoints (`/ws`, `/metrics`, `/leaderboard`), and correctly penalizes stray agent anomalies.
-**Good luck at the finals!**

README.md CHANGED Viewed

@@ -199,6 +199,12 @@ POST /multi-agent/step/b/{id}  # {"action_type": "restart_service", ...}
 | POST | `/multi-agent/reset` | Start dual-agent session |
 | POST | `/multi-agent/step/a/{id}` | Agent A shares finding |
 | POST | `/multi-agent/step/b/{id}` | Agent B takes action |
 | GET | `/docs` | Swagger UI |
 ---

 | POST | `/multi-agent/reset` | Start dual-agent session |
 | POST | `/multi-agent/step/a/{id}` | Agent A shares finding |
 | POST | `/multi-agent/step/b/{id}` | Agent B takes action |
+| GET | `/live` | Live NOC dashboard (real-time) |
+| GET | `/challenge` | Human vs Agent challenge |
+| GET | `/progress` | Score progression visualization |
+| GET | `/replays` | Episode replay list |
+| GET | `/replay/{id}` | Full episode replay |
+| GET | `/replay/{id}/html` | Replay HTML viewer |
 | GET | `/docs` | Swagger UI |
 ---

README_github.md DELETED Viewed

@@ -1,223 +0,0 @@
-# ARIA — DevOps Incident Response
-### *The first OpenEnv RL environment for production incident response*
-[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Twilight-13/devops-incident-response/blob/main/train_grpo.ipynb)
-[![HF Space](https://img.shields.io/badge/🤗-Live%20Environment-orange)](https://huggingface.co/spaces/Arijit-07/devops-incident-response)
-[![Trained Model](https://img.shields.io/badge/🤗-Llama--3.1--8B%20Fine--tuned-blue)](https://huggingface.co/Arijit-07/aria-devops-llama8b)
-[![License](https://img.shields.io/badge/License-Apache_2.0-green.svg)](LICENSE)
-> **ARIA** — Adaptive Reward & Incident Architecture
-> Built for the Meta × PyTorch × HuggingFace OpenEnv Hackathon Finals | Bangalore, April 2026
----
-## 🔗 Quick Links for Judges
-| Resource | Link |
-|---|---|
-| **Live Environment** | https://arijit-07-devops-incident-response.hf.space |
-| **Interactive API** | https://arijit-07-devops-incident-response.hf.space/docs |
-| **Trained Model (8B)** | https://huggingface.co/Arijit-07/aria-devops-llama8b |
-| **Training Curve** | https://huggingface.co/Arijit-07/aria-devops-llama8b/resolve/main/training_curve_8b.png |
-| **Blog Post** | https://huggingface.co/blog/Arijit-07/aria-devops-incident-response |
-| **GitHub** | https://github.com/Twilight-13/devops-incident-response |
-| **Validate** | https://arijit-07-devops-incident-response.hf.space/validate |
-| **About (machine-readable)** | https://arijit-07-devops-incident-response.hf.space/about |
----
-## ⚡ Run a Complete Episode Right Now
-```bash
-# 1. Start an easy incident
-curl -X POST https://arijit-07-devops-incident-response.hf.space/reset \
-  -H "Content-Type: application/json" \
-  -d '{"task_id": "easy", "seed": 42}'
-# 2. Read logs on the failing service (reward: +0.15)
-curl -X POST https://arijit-07-devops-incident-response.hf.space/step \
-  -H "Content-Type: application/json" \
-  -d '{"action_type": "read_logs", "service": "payment-service"}'
-# 3. Diagnose (reward: +0.30)
-curl -X POST https://arijit-07-devops-incident-response.hf.space/step \
-  -H "Content-Type: application/json" \
-  -d '{"action_type": "diagnose", "root_cause": "memory leak in payment-service"}'
-# 4. Fix it (reward: +0.40)
-curl -X POST https://arijit-07-devops-incident-response.hf.space/step \
-  -H "Content-Type: application/json" \
-  -d '{"action_type": "restart_service", "service": "payment-service"}'
-# 5. Validate all 7 tasks pass
-curl https://arijit-07-devops-incident-response.hf.space/validate
-```
----
-## 🎯 The Problem
-Every company running microservices faces the same reality: **production incidents are expensive, stressful, and happen at 3am.**
-SWE-bench tests code generation. WebArena tests web navigation. Nothing trains agents to handle live production incidents — to read logs strategically, trace cascading failures, correlate subtle business anomalies, and apply precise fixes where wrong choices cause collateral damage.
-**ARIA fills that gap.**
----
-## 🎬 The 7 Tasks
-| Task | Max Steps | Random | Strong LLM | Scenario |
-|---|---|---|---|---|
-| `easy` | 15 | 0.05 | 0.85–1.00 | Single service OOM crash-loop |
-| `medium` | 20 | 0.03 | 0.55–0.75 | Cascading failure + red herring alert |
-| `hard` | 25 | 0.01 | 0.30–0.50 | **Silent** corruption — all services green |
-| `bonus` | 25 | 0.01 | 0.35–0.55 | Two simultaneous independent failures |
-| `security` | 20 | 0.01 | 0.40–0.60 | DDoS botnet credential stuffing |
-| `database` | 20 | 0.01 | 0.45–0.65 | Missing index — full table scans |
-| `failover` | 25 | 0.01 | 0.35–0.55 | Multi-region network partition |
-| `generated` | 20 | 0.01 | variable | Procedural — seed-deterministic |
----
-## 🏆 Reward Function
-```
-Final Score = Σ(step_rewards)
-            + efficiency_bonus     # (1 - steps/max_steps) × 0.05
-            + diagnosis_precision  # +0.03 if ≥50% keyword overlap
-            - noop_penalty         # (noops - 3) × 0.02
-```
-Clamped to **(0.001, 0.999)** for GRPO stability.
-| Action | Reward | Penalty Triggers |
-|---|---|---|
-| `read_logs` correct | +0.15 | Restart healthy service: **-0.15** |
-| `diagnose` full match | +0.35 | Fix without diagnosing: **-0.10** |
-| `restart_service` correct | +0.45 | Wrong failover (payment): **-0.25** |
-| `block_ip_range` | +0.40 | Excessive noops: **-0.04 each** |
-| `alert_oncall` (required) | +0.15 | |
-**Semantic matching:** keyword overlap not exact string — LLMs that paraphrase aren't penalized.
----
-## 🌟 ARIA Features
-### Curriculum Engine
-Rolling average per task (last 5 episodes). Promotes when avg > 0.75. Scaffolds with hints when avg < 0.30. Agents always train at the edge of their capability.
-```bash
-GET /curriculum/status
-GET /curriculum/next
-POST /curriculum/record  # {"task_id": "easy", "score": 0.85}
-```
-### Incident Generator
-Seeds 0–99,999 → unique reproducible incidents. 6 failure modes × 8 services × 3 severities × 0–3 noise alerts.
-```bash
-GET /generate/preview?seed=1337
-POST /reset  # {"task_id": "generated", "seed": 1337}
-```
-### Dual-Agent Mode
-Split observability. Agent A (Observer) sees logs and alerts. Agent B (Responder) sees metrics and dependencies. They coordinate via `share_finding`. Neither can solve the incident alone.
-```bash
-POST /multi-agent/reset    # {"task_id": "easy", "seed": 42}
-POST /multi-agent/step/a/{id}  # {"finding": "order-service OOM"}
-POST /multi-agent/step/b/{id}  # {"action_type": "restart_service", ...}
-```
----
-## 🧠 Training Results
-**Model:** [Arijit-07/aria-devops-llama8b](https://huggingface.co/Arijit-07/aria-devops-llama8b)
-| Task | Baseline | Fine-tuned | **Improvement** |
-|---|---|---|---|
-| easy | 0.320 | 0.685 | **+0.365** |
-| medium | 0.050 | 0.378 | **+0.328** |
-| hard | 0.190 | 0.869 | **+0.679** |
-| bonus | 0.152 | 0.682 | **+0.530** |
-![Training Curve](https://huggingface.co/Arijit-07/aria-devops-llama8b/resolve/main/training_curve_8b.png)
-**Setup:** GRPO · Llama-3.1-8B · LoRA rank=32 · 160 episodes · NVIDIA L4 · 162 minutes · Unsloth + HuggingFace TRL
-**Key fix:** Group completions scored on fresh environment snapshots — prevents reward gate exhaustion during GRPO group generation.
-[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Twilight-13/devops-incident-response/blob/main/train_grpo.ipynb)
----
-## 📡 API Reference
-| Method | Endpoint | Description |
-|---|---|---|
-| GET | `/health` | Liveness check |
-| GET | `/about` | Full machine-readable description |
-| GET | `/tasks` | All 8 tasks |
-| POST | `/reset` | Start episode |
-| POST | `/step` | Take action |
-| GET | `/state` | Full state + ground truth |
-| GET | `/validate` | Self-test all 7 tasks |
-| GET | `/metrics` | Aggregate statistics |
-| GET | `/leaderboard` | Top 10 episodes |
-| WS | `/ws` | WebSocket real-time |
-| GET | `/curriculum/status` | Per-task mastery |
-| GET | `/curriculum/next` | Recommended task |
-| POST | `/curriculum/record` | Feed training results |
-| GET | `/generate/preview` | Preview procedural incident |
-| POST | `/multi-agent/reset` | Start dual-agent session |
-| POST | `/multi-agent/step/a/{id}` | Agent A shares finding |
-| POST | `/multi-agent/step/b/{id}` | Agent B takes action |
-| GET | `/docs` | Swagger UI |
----
-## 📊 Benchmark Comparison
-| Benchmark | Domain | Partial Obs | Dense Reward | Curriculum | Multi-Agent |
-|---|---|---|---|---|---|
-| SWE-bench | Code repair | ✗ | ✗ | ✗ | ✗ |
-| WebArena | Web navigation | ✓ | ✗ | ✗ | ✗ |
-| AgentBench | General tools | ✗ | ✗ | ✗ | ✗ |
-| **ARIA** | **Incident response** | **✓** | **✓** | **✓** | **✓** |
----
-## 🚀 Setup
-```bash
-docker build -t aria-devops-incident .
-docker run -p 7860:7860 aria-devops-incident
-# Or local
-pip install -r requirements.txt
-uvicorn api:app --host 0.0.0.0 --port 7860
-```
----
-## 📁 Structure
-```
-├── api.py / server/app.py    # FastAPI — all endpoints
-├── env.py                    # Environment dispatcher
-├── models.py                 # Pydantic models
-├── tasks/                    # 7 tasks + generated
-├── curriculum/engine.py      # Adaptive difficulty
-├── generator/                # Procedural incidents
-├── multi_agent/session.py    # Dual-agent mode
-├── graders/grader.py         # Deterministic grader
-├── demo_llm.py               # Live terminal demo
-├── train_grpo.ipynb          # Training notebook
-├── BLOG.md                   # Project story
-└── openenv.yaml              # OpenEnv manifest
-```
-Apache 2.0 · *Built solo for the Meta × PyTorch × HuggingFace OpenEnv Hackathon Finals — Bangalore, April 2026*

escape.py DELETED Viewed

@@ -1,35 +0,0 @@
-import sys
-import re
-with open("ui_test.py", "r", encoding="utf-8") as f:
-    ui_html = f.read()
-# Remove the first line (`html_content = """`) and the last line (`"""`)
-ui_html = ui_html.replace('html_content = """', '')
-ui_html = ui_html[:-3] # remove last """
-# Escape curly braces
-ui_html = ui_html.replace('{', '{{').replace('}', '}}')
-with open("server/app.py", "r", encoding="utf-8") as f:
-    app_content = f.read()
-# The function to replace is def dashboard() ... to </html>"""
-# Let's find def dashboard():
-start_idx = app_content.find("def dashboard():")
-end_idx = app_content.find("</html>\"\"\"", start_idx) + len("</html>\"\"\"")
-if start_idx == -1 or end_idx == -1:
-    print("Could not find dashboard function in server/app.py")
-    sys.exit(1)
-new_dashboard = f'''def dashboard():
-    html = f"""{ui_html}"""
-    return html'''
-new_content = app_content[:start_idx] + new_dashboard + app_content[end_idx:]
-with open("server/app.py", "w", encoding="utf-8") as f:
-    f.write(new_content)
-print("Successfully replaced dashboard endpoint!")

inject_live.py DELETED Viewed

@@ -1,618 +0,0 @@
-import sys
-html_content = r'''
-@app.get("/live", response_class=HTMLResponse)
-async def live_dashboard():
-    html = f"""<!DOCTYPE html>
-<html lang="en">
-<head>
-<meta charset="UTF-8">
-<title>ARIA NOC LIVE</title>
-<link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;600;700&family=Share+Tech+Mono&display=swap" rel="stylesheet">
-<style>
-  :root {{
-    --void: #000000;
-    --bg: #060914;
-    --surface: #0a0f1e;
-    --surface2: #0d1628;
-    --border: #1a2744;
-    --border-bright: #2a4080;
-    --blue: #4d9fff;
-    --blue-dim: #1a3a6e;
-    --cyan: #00d4ff;
-    --green: #00ff88;
-    --green-dim: #003a1e;
-    --yellow: #ffaa00;
-    --yellow-dim: #3a2800;
-    --red: #ff3355;
-    --red-dim: #3a0011;
-    --purple: #9d4edd;
-    --text: #c8d8f0;
-    --text-dim: #4a6080;
-    --text-mono: #8ab4d4;
-  }}
-  * {{ box-sizing: border-box; margin: 0; padding: 0; }}
-  body {{
-    background-color: var(--bg);
-    color: var(--text);
-    font-family: 'Inter', sans-serif;
-    overflow: hidden;
-    height: 100vh;
-    display: grid;
-    grid-template-rows: 48px 1fr 56px;
-    grid-template-columns: 28% 44% 28%;
-    grid-template-areas:
-      "top top top"
-      "left center right"
-      "bottom bottom bottom";
-  }}
-  .scanlines {{
-    position: fixed;
-    top: 0; left: 0; width: 100%; height: 100%;
-    pointer-events: none;
-    z-index: 9999;
-    background: repeating-linear-gradient(
-      0deg,
-      transparent,
-      transparent 2px,
-      rgba(0,0,0,0.03) 2px,
-      rgba(0,0,0,0.03) 4px
-    );
-  }}
-  .mono {{ font-family: 'Share Tech Mono', monospace; }}
-  .uppercase {{ text-transform: uppercase; }}
-  #top-bar {{
-    grid-area: top;
-    background: var(--void);
-    border-bottom: 1px solid var(--border);
-    display: flex;
-    justify-content: space-between;
-    align-items: center;
-    padding: 0 16px;
-  }}
-  .top-left, .top-center, .top-right {{ display: flex; align-items: center; gap: 12px; }}
-  .logo {{ font-size: 18px; color: var(--blue); font-weight: bold; }}
-  .logo-sub {{ font-size: 10px; color: var(--text-dim); }}
-  .separator {{ width: 1px; height: 24px; background: var(--border); }}
-  .status-dot {{ width: 8px; height: 8px; border-radius: 50%; }}
-  .dot-green {{ background: var(--red); animation: livePulse 1.5s infinite; }}
-  .dot-grey {{ background: var(--text-dim); }}
-  @keyframes livePulse {{
-    0% {{ opacity: 0; }}
-    50% {{ opacity: 1; }}
-    100% {{ opacity: 0; }}
-  }}
-  .control-label {{ font-size: 9px; color: var(--text-dim); }}
-  .terminal-input {{
-    background: var(--surface);
-    border: 1px solid var(--border-bright);
-    color: var(--blue);
-    font-family: 'Share Tech Mono', monospace;
-    padding: 4px 8px;
-    outline: none;
-  }}
-  .btn-deploy {{
-    background: var(--blue-dim);
-    border: 1px solid var(--blue);
-    color: var(--blue);
-    font-family: 'Share Tech Mono', monospace;
-    font-size: 11px;
-    padding: 6px 16px;
-    cursor: pointer;
-    transition: 0.2s;
-  }}
-  .btn-deploy:hover {{ background: var(--blue); color: var(--void); }}
-  .step-counter {{ font-size: 16px; color: var(--cyan); }}
-  .score-display-small {{ font-size: 20px; font-weight: bold; }}
-  .clock {{ font-size: 11px; color: var(--text-dim); }}
-  .panel {{
-    padding: 16px;
-    display: flex;
-    flex-direction: column;
-    gap: 12px;
-    overflow: hidden;
-  }}
-  #left-panel {{ grid-area: left; border-right: 1px solid var(--border); }}
-  #center-panel {{ grid-area: center; border-right: 1px solid var(--border); }}
-  #right-panel {{ grid-area: right; border-color: var(--purple); }}
-  .panel-header {{
-    display: flex; align-items: center; gap: 8px; font-size: 9px; color: var(--text-dim); margin-bottom: 8px;
-  }}
-  .pill {{ background: var(--surface2); padding: 2px 6px; border-radius: 10px; color: var(--text); }}
-  #service-list {{ display: flex; flex-direction: column; gap: 8px; overflow-y: auto; flex: 1; }}
-  .service-item {{
-    height: 52px; padding: 0 12px; display: flex; justify-content: space-between; align-items: center; flex-shrink: 0; transition: border-color 0.3s, background 0.3s;
-  }}
-  .svc-name {{ font-size: 12px; color: var(--text); }}
-  .svc-status {{ font-size: 9px; margin-top: 4px; }}
-  .svc-stats {{ text-align: right; }}
-  .svc-stat-line {{ font-size: 11px; }}
-  @keyframes statusFlash {{
-    0% {{ border-color: var(--text); }}
-    100% {{ border-color: inherit; }}
-  }}
-  @keyframes criticalFlash {{
-    0%, 50%, 100% {{ border-color: var(--border); }}
-    25%, 75% {{ border-color: var(--red); }}
-  }}
-  .flash-critical {{ animation: criticalFlash 0.5s ease-in-out; border-color: var(--red) !important; }}
-  @keyframes resolveFlash {{
-    0%, 50%, 100% {{ border-color: var(--border); }}
-    25%, 75% {{ border-color: var(--green); }}
-  }}
-  .flash-resolve {{ animation: resolveFlash 2s ease-in-out; border-color: var(--green) !important; }}
-  @keyframes pulseScore {{
-    0% {{ transform: scale(1); }}
-    50% {{ transform: scale(1.1); }}
-    100% {{ transform: scale(1); }}
-  }}
-  .pulse-score {{ animation: pulseScore 2s ease-in-out; }}
-  @keyframes slideInRight {{
-    from {{ transform: translateX(20px); opacity: 0; }}
-    to {{ transform: translateX(0); opacity: 1; }}
-  }}
-  @keyframes fadeIn {{
-    from {{ opacity: 0; }}
-    to {{ opacity: 1; }}
-  }}
-  .center-top {{ flex: 1; display: flex; flex-direction: column; overflow: hidden; }}
-  .center-bottom {{ height: 200px; display: flex; flex-direction: column; justify-content: flex-end; }}
-  #alerts-list {{ display: flex; flex-direction: column; gap: 8px; flex: 1; }}
-  .alert-strip {{
-    height: 36px; display: flex; align-items: center; gap: 8px; padding-right: 12px; animation: slideInRight 0.3s ease-out;
-  }}
-  .alert-badge {{
-    height: 100%; padding: 0 8px; display: flex; align-items: center; font-size: 9px; font-weight: bold; color: #000;
-  }}
-  .alert-text {{ font-size: 11px; color: var(--text); white-space: nowrap; overflow: hidden; text-overflow: ellipsis; }}
-  .no-alerts {{ text-align: center; color: var(--text-dim); margin-top: 40px; animation: livePulse 3s infinite; }}
-  .giant-score {{ font-size: 48px; font-weight: bold; text-align: center; margin-bottom: 12px; text-shadow: 0 0 20px currentColor; }}
-  .progress-container {{ width: 100%; height: 8px; background: var(--surface); margin-bottom: 8px; }}
-  .progress-fill {{ height: 100%; background: linear-gradient(90deg, var(--blue), var(--green)); transition: width 0.5s ease; width: 0%; }}
-  .score-stats {{ display: flex; justify-content: space-between; font-size: 10px; color: var(--text-dim); margin-bottom: 16px; }}
-  .sparkline {{ display: flex; align-items: flex-end; gap: 4px; height: 40px; margin-top: auto; }}
-  .spark-bar {{ width: 16px; background: var(--green); animation: slideInRight 0.2s ease-out; position: relative; }}
-  .spark-label {{ position: absolute; bottom: -14px; left: 50%; transform: translateX(-50%); font-size: 8px; color: var(--text-dim); }}
-  #agent-log {{
-    flex: 1; overflow-y: auto; display: flex; flex-direction: column; gap: 4px;
-  }}
-  .log-entry {{ animation: fadeIn 0.2s ease-out; font-size: 11px; line-height: 1.4; }}
-  .log-time {{ color: var(--text-dim); margin-right: 8px; }}
-  .log-action {{ color: var(--purple); }}
-  .log-reward {{ padding-left: 48px; }}
-  .log-evidence {{ color: var(--text-dim); font-style: italic; padding-left: 48px; }}
-  .log-diagnose {{ color: var(--yellow); }}
-  .log-fix {{ color: var(--cyan); }}
-  .log-episode-start {{ color: var(--cyan); text-align: center; margin: 8px 0; }}
-  .log-episode-end-ok {{ color: var(--green); text-align: center; margin: 8px 0; }}
-  .log-episode-end-fail {{ color: var(--red); text-align: center; margin: 8px 0; }}
-  #bottom-bar {{
-    grid-area: bottom; background: var(--void); border-top: 1px solid var(--border); display: flex; justify-content: space-between; align-items: center; padding: 0 16px;
-  }}
-  .ws-status {{ display: flex; align-items: center; gap: 8px; font-size: 11px; }}
-  .tip-text {{ font-size: 11px; color: var(--text-dim); font-style: italic; transition: opacity 0.5s; }}
-  .footer-right {{ font-size: 10px; color: var(--text-dim); }}
-  ::-webkit-scrollbar {{ width: 4px; }}
-  ::-webkit-scrollbar-track {{ background: transparent; }}
-  ::-webkit-scrollbar-thumb {{ background: var(--border-bright); }}
-</style>
-</head>
-<body>
-<div class="scanlines"></div>
-<div id="top-bar">
-  <div class="top-left">
-    <div class="logo mono">▣ ARIA</div>
-    <div class="logo-sub uppercase">Incident Response System</div>
-    <div class="separator"></div>
-    <div class="status-dot dot-grey" id="live-dot"></div>
-    <div class="logo-sub mono" id="live-text" style="color: var(--text)">OFFLINE</div>
-  </div>
-  <div class="top-center">
-    <div class="control-label uppercase">Active Scenario</div>
-    <select class="terminal-input" id="task-select">
-      <option value="easy">EASY</option>
-      <option value="medium">MEDIUM</option>
-      <option value="hard">HARD</option>
-      <option value="bonus">BONUS</option>
-      <option value="security">SECURITY</option>
-      <option value="database">DATABASE</option>
-      <option value="failover">FAILOVER</option>
-      <option value="generated">GENERATED</option>
-    </select>
-    <div class="control-label uppercase">Seed:</div>
-    <input type="number" class="terminal-input" id="seed-input" value="42" style="width: 70px;">
-    <button class="btn-deploy" onclick="deployIncident()">▶ DEPLOY INCIDENT</button>
-  </div>
-  <div class="top-right">
-    <div class="step-counter mono" id="top-step">00 / 15</div>
-    <div class="separator"></div>
-    <div class="score-display-small mono" id="top-score">0.000</div>
-    <div class="separator"></div>
-    <div class="clock mono" id="clock">00:00:00</div>
-  </div>
-</div>
-<div id="left-panel" class="panel">
-  <div class="panel-header uppercase">
-    ◈ Infrastructure Status <span class="pill mono" id="svc-count">0</span>
-  </div>
-  <div id="service-list"></div>
-</div>
-<div id="center-panel" class="panel">
-  <div class="center-top">
-    <div class="panel-header uppercase">
-      ◈ Active Alerts <span class="pill mono" id="alert-count" style="background:var(--surface2)">0</span>
-    </div>
-    <div id="alerts-list">
-      <div class="no-alerts mono">◎ ALL SYSTEMS NOMINAL</div>
-    </div>
-  </div>
-  <div class="center-bottom">
-    <div class="panel-header uppercase">◈ Episode Metrics</div>
-    <div class="giant-score mono" id="giant-score" style="color: var(--text-dim)">0.000</div>
-    <div class="progress-container"><div class="progress-fill" id="score-bar"></div></div>
-    <div class="score-stats mono uppercase">
-      <span id="stat-step">STEP 0/15</span>
-      <span id="stat-task">TASK: --</span>
-      <span id="stat-seed">SEED: --</span>
-    </div>
-    <div class="sparkline" id="sparkline"></div>
-  </div>
-</div>
-<div id="right-panel" class="panel">
-  <div class="panel-header uppercase" style="color: var(--purple)">◈ Agent Reasoning</div>
-  <div id="agent-log" class="mono"></div>
-</div>
-<div id="bottom-bar">
-  <div class="ws-status mono">
-    <div class="status-dot dot-grey" id="btm-dot"></div>
-    <span id="btm-text" style="color: var(--text-dim)">○ WS DISCONNECTED</span>
-  </div>
-  <div class="tip-text" id="tip-text">ⓘ Agents must read_logs before acting — blind remediation triggers -0.10 penalty</div>
-  <div class="footer-right mono">ARIA v2.0 · OpenEnv Compliant &nbsp;&nbsp; 🤗 Arijit-07</div>
-</div>
-<script>
-  const TIPS = [
-    "ⓘ Agents must read_logs before acting — blind remediation triggers -0.10 penalty",
-    "ⓘ Collateral damage: restarting healthy services costs -0.15",
-    "ⓘ 7 tasks · 14 actions · Dense reward shaping · Semantic diagnosis matching",
-    "ⓘ Curriculum Engine adapts difficulty to agent performance",
-    "ⓘ Dual-Agent Mode: Observer sees logs, Responder sees metrics",
-    "ⓘ Grader clamped to (0.001, 0.999) for GRPO advantage stability",
-    "ⓘ Hard task: all services green — signal buried in business metrics"
-  ];
-  let tipIdx = 0;
-  setInterval(() => {{
-    const el = document.getElementById('tip-text');
-    el.style.opacity = 0;
-    setTimeout(() => {{
-      tipIdx = (tipIdx + 1) % TIPS.length;
-      el.textContent = TIPS[tipIdx];
-      el.style.opacity = 1;
-    }}, 500);
-  }}, 15000);
-  setInterval(() => {{
-    const now = new Date();
-    document.getElementById('clock').textContent = now.toTimeString().split(' ')[0];
-  }}, 1000);
-  let ws = null;
-  let currentTask = 'easy';
-  let currentSeed = 42;
-  let stepCount = 0;
-  let totalScore = 0;
-  let isRunning = false;
-  let rewardHistory = [];
-  function getScoreColor(sc) {{
-    if(sc < 0.3) return 'var(--red)';
-    if(sc < 0.6) return 'var(--yellow)';
-    return 'var(--green)';
-  }}
-  function updateScoreDisplay() {{
-    const sc = Math.max(0, totalScore);
-    const col = getScoreColor(sc);
-    const ts = document.getElementById('top-score');
-    ts.textContent = sc.toFixed(3);
-    ts.style.color = col;
-    const gs = document.getElementById('giant-score');
-    gs.textContent = sc.toFixed(3);
-    gs.style.color = col;
-    document.getElementById('score-bar').style.width = Math.min(100, sc * 100) + '%';
-  }}
-  function updateStepCounter() {{
-    document.getElementById('top-step').textContent = `${{stepCount.toString().padStart(2,'0')}} / 15`;
-    document.getElementById('stat-step').textContent = `STEP ${{stepCount}}/15`;
-  }}
-  function addLog(type, arg1, arg2) {{
-    const logEl = document.getElementById('agent-log');
-    const div = document.createElement('div');
-    div.className = 'log-entry';
-    const timeStr = new Date().toTimeString().split(' ')[0];
-    const timeSpan = `<span class="log-time">[${{timeStr}}]</span>`;
-    if (type === 'SYSTEM') {{
-      div.innerHTML = `${{timeSpan}} <span style="color:var(--text-dim)">${{arg1}}</span>`;
-    }} else if (type === 'EPISODE_START') {{
-      div.innerHTML = `<div class="log-episode-start">━━━ NEW INCIDENT DEPLOYED ━━━<br>Task: ${{arg1.toUpperCase()}} | Seed: ${{arg2}}</div>`;
-    }} else if (type === 'ACTION') {{
-      div.innerHTML = `${{timeSpan}} <span class="log-action">→ ${{arg1.action_type}} ${{arg1.service || ''}}</span>`;
-    }} else if (type === 'REWARD') {{
-      let col = arg1 > 0 ? 'var(--green)' : (arg1 === 0 ? 'var(--red)' : 'var(--text-dim)');
-      div.innerHTML = `<div class="log-reward" style="color:${{col}}">✦ ${{arg1 > 0 ? '+' : ''}}${{arg1.toFixed(3)}} reward</div>`;
-    }} else if (type === 'EVIDENCE') {{
-      let txt = (arg1 || '').substring(0, 60);
-      if(arg1 && arg1.length > 60) txt += '...';
-      div.innerHTML = `<div class="log-evidence">↳ ${{txt}}</div>`;
-    }} else if (type === 'DIAGNOSE') {{
-      div.innerHTML = `${{timeSpan}} <span class="log-diagnose">⊕ DIAGNOSIS: ${{arg1}}</span>`;
-    }} else if (type === 'FIX') {{
-      div.innerHTML = `${{timeSpan}} <span class="log-fix">⚡ FIX APPLIED: ${{arg1}} → ${{arg2}}</span>`;
-    }} else if (type === 'EPISODE_END') {{
-      if (arg1 >= 0.7) {{
-        div.innerHTML = `<div class="log-episode-end-ok">━━━ ✓ INCIDENT RESOLVED ━━━<br>Score: ${{arg1.toFixed(3)}} | Steps: ${{arg2}}/15<br>━━━━━━━━━━━━━━━━━━━━━━━━━━━</div>`;
-        document.getElementById('center-panel').classList.add('flash-resolve');
-        document.getElementById('giant-score').classList.add('pulse-score');
-        setTimeout(()=>{{
-          document.getElementById('center-panel').classList.remove('flash-resolve');
-          document.getElementById('giant-score').classList.remove('pulse-score');
-        }}, 2000);
-      }} else {{
-        div.innerHTML = `<div class="log-episode-end-fail">━━━ ✗ INCIDENT ESCALATED ━━━<br>Score: ${{arg1.toFixed(3)}} | Steps: ${{arg2}}/15<br>━━━━━━━━━━━━━━━━━━━━━━━━━━━</div>`;
-      }}
-    }}
-    logEl.appendChild(div);
-    if(logEl.children.length > 200) logEl.removeChild(logEl.firstChild);
-    logEl.scrollTop = logEl.scrollHeight;
-  }}
-  function updateSparkline() {{
-    const sp = document.getElementById('sparkline');
-    sp.innerHTML = '';
-    const start = Math.max(0, rewardHistory.length - 12);
-    const recent = rewardHistory.slice(start);
-    recent.forEach((r, i) => {{
-      const h = Math.max(2, Math.min(40, (r / 0.5) * 40));
-      const col = r > 0 ? 'var(--green)' : 'var(--red)';
-      sp.innerHTML += `<div class="spark-bar" style="height:${{h}}px; background:${{col}}"><div class="spark-label">${{start + i + 1}}</div></div>`;
-    }});
-  }}
-  function startEpisode(task, seed) {{
-    stepCount = 0;
-    totalScore = 0;
-    rewardHistory = [];
-    isRunning = true;
-    currentTask = task;
-    currentSeed = seed;
-    document.getElementById('stat-task').textContent = `TASK: ${{task.toUpperCase()}}`;
-    document.getElementById('stat-seed').textContent = `SEED: ${{seed}}`;
-    updateStepCounter();
-    updateScoreDisplay();
-    updateSparkline();
-    document.getElementById('alerts-list').innerHTML = '<div class="no-alerts mono">◎ ALL SYSTEMS NOMINAL</div>';
-    document.getElementById('alert-count').textContent = '0';
-    document.getElementById('alert-count').style.background = 'var(--surface2)';
-    addLog('EPISODE_START', task, seed);
-    if(ws && ws.readyState === WebSocket.OPEN) {{
-      ws.send(JSON.stringify({{command: "reset", task_id: task, seed: seed}}));
-    }}
-  }}
-  function deployIncident() {{
-    const task = document.getElementById('task-select').value;
-    const seed = parseInt(document.getElementById('seed-input').value) || 42;
-    if(ws && ws.readyState === WebSocket.OPEN) {{
-      startEpisode(task, seed);
-    }} else {{
-      connectWS();
-    }}
-  }}
-  function connectWS() {{
-    if(ws) ws.close();
-    const protocol = window.location.protocol === 'https:' ? 'wss:' : 'ws:';
-    const wsUrl = `${{protocol}}//${{window.location.host}}/ws`;
-    ws = new WebSocket(wsUrl);
-    ws.onopen = () => {{
-      document.getElementById('live-dot').className = 'status-dot dot-green';
-      document.getElementById('live-text').textContent = 'LIVE';
-      document.getElementById('live-text').style.color = 'var(--red)';
-      document.getElementById('btm-dot').className = 'status-dot dot-green';
-      document.getElementById('btm-text').textContent = '◉ WS CONNECTED';
-      document.getElementById('btm-text').style.color = 'var(--green)';
-      addLog('SYSTEM', 'WebSocket connected');
-      startEpisode(currentTask, currentSeed);
-    }};
-    ws.onclose = () => {{
-      document.getElementById('live-dot').className = 'status-dot dot-grey';
-      document.getElementById('live-text').textContent = 'OFFLINE';
-      document.getElementById('live-text').style.color = 'var(--text)';
-      document.getElementById('btm-dot').className = 'status-dot dot-grey';
-      document.getElementById('btm-text').textContent = '○ WS DISCONNECTED';
-      document.getElementById('btm-text').style.color = 'var(--text-dim)';
-      addLog('SYSTEM', 'Disconnected — reconnecting in 3s...');
-      setTimeout(connectWS, 3000);
-    }};
-    ws.onmessage = (event) => {{
-      let data;
-      try {{ data = JSON.parse(event.data); }} catch(e) {{ return; }}
-      if(data.services) {{
-        const svcs = Object.entries(data.services).map(([name, s]) => ({{name, ...s}}));
-        svcs.sort((a, b) => {{
-          const val = (st) => st === 'down' ? 0 : (st === 'degraded' ? 1 : 2);
-          return val(a.status) - val(b.status);
-        }});
-        const list = document.getElementById('service-list');
-        list.innerHTML = '';
-        document.getElementById('svc-count').textContent = svcs.length;
-        svcs.forEach(s => {{
-          let bcol = 'var(--border)', bgcol = 'var(--surface)', tcol = 'var(--text-dim)', stxt = '○ UNKNOWN';
-          if(s.status === 'down') {{ bcol = 'var(--red)'; bgcol = 'var(--red-dim)'; tcol = 'var(--red)'; stxt = '● DOWN'; }}
-          else if(s.status === 'degraded') {{ bcol = 'var(--yellow)'; bgcol = 'var(--yellow-dim)'; tcol = 'var(--yellow)'; stxt = '◐ DEGRADED'; }}
-          else if(s.status === 'healthy') {{ bcol = 'var(--green)'; bgcol = 'var(--green-dim)'; tcol = 'var(--green)'; stxt = '○ HEALTHY'; }}
-          let errRate = (s.error_rate * 100).toFixed(1);
-          let memUtil = (s.memory_utilization * 100).toFixed(1);
-          let errCol = s.error_rate > 0.3 ? 'var(--red)' : (s.error_rate > 0.1 ? 'var(--yellow)' : 'var(--green)');
-          let memCol = s.memory_utilization > 0.9 ? 'var(--red)' : (s.memory_utilization > 0.7 ? 'var(--yellow)' : 'var(--green)');
-          list.innerHTML += `
-            <div class="service-item mono" style="border-left: 3px solid ${{bcol}}; background: ${{bgcol}}">
-              <div>
-                <div class="svc-name">${{s.name}}</div>
-                <div class="svc-status" style="color:${{tcol}}">${{stxt}}</div>
-              </div>
-              <div class="svc-stats">
-                <div class="svc-stat-line" style="color:${{errCol}}">ERR ${{errRate}}%</div>
-                <div class="svc-stat-line" style="color:${{memCol}}">MEM ${{memUtil}}%</div>
-              </div>
-            </div>
-          `;
-        }});
-      }}
-      if(data.active_alerts) {{
-        const alist = document.getElementById('alerts-list');
-        alist.innerHTML = '';
-        document.getElementById('alert-count').textContent = data.active_alerts.length;
-        document.getElementById('alert-count').style.background = data.active_alerts.length > 0 ? 'var(--red)' : 'var(--surface2)';
-        if(data.active_alerts.length === 0) {{
-          alist.innerHTML = '<div class="no-alerts mono">◎ ALL SYSTEMS NOMINAL</div>';
-        }} else {{
-          let critFound = false;
-          data.active_alerts.slice(0, 5).forEach(a => {{
-            let bg = 'var(--surface)', border = 'var(--border)', txtCol = '#000';
-            if(a.severity === 'CRITICAL') {{ border = 'var(--red)'; bg = 'var(--red)'; critFound = true; }}
-            else if(a.severity === 'HIGH') {{ border = '#ff6600'; bg = '#ff6600'; }}
-            else if(a.severity === 'WARNING') {{ border = 'var(--yellow)'; bg = 'var(--yellow)'; }}
-            else {{ border = 'var(--blue)'; bg = 'var(--blue)'; }}
-            alist.innerHTML += `
-              <div class="alert-strip mono" style="border-left: 3px solid ${{border}}; background: ${{bg}}20">
-                <div class="alert-badge" style="background:${{bg}}; color:${{txtCol}}">${{a.severity}}</div>
-                <div class="alert-text">[${{a.service}}] ${{a.message}}</div>
-              </div>
-            `;
-          }});
-          if(data.active_alerts.length > 5) {{
-            alist.innerHTML += `<div class="mono" style="font-size:9px; color:var(--text-dim); text-align:center">+${{data.active_alerts.length - 5}} more</div>`;
-          }}
-          if(critFound) {{
-            const lp = document.getElementById('left-panel');
-            lp.classList.remove('flash-critical');
-            void lp.offsetWidth;
-            lp.classList.add('flash-critical');
-          }}
-        }}
-      }}
-      if(data.action !== undefined && isRunning) {{
-        stepCount++;
-        updateStepCounter();
-        let act = data.action;
-        if(typeof act === 'string') try{{ act = JSON.parse(act) }}catch(e){{}}
-        if(act.action_type === 'diagnose') addLog('DIAGNOSE', act.root_cause);
-        else if(act.action_type === 'restart_service' || act.action_type === 'rollback_service' || act.action_type === 'block_ip')
-          addLog('FIX', act.action_type, act.service || act.ip);
-        else addLog('ACTION', act);
-        if(data.evidence) addLog('EVIDENCE', data.evidence);
-        if(data.reward !== undefined) {{
-          totalScore += data.reward;
-          rewardHistory.push(data.reward);
-          addLog('REWARD', data.reward);
-          updateScoreDisplay();
-          updateSparkline();
-        }}
-        if(data.done) {{
-          isRunning = false;
-          addLog('EPISODE_END', totalScore, stepCount);
-          updateScoreDisplay();
-          setTimeout(() => {{
-            currentSeed = Math.floor(Math.random() * 99999);
-            document.getElementById('seed-input').value = currentSeed;
-            startEpisode(currentTask, currentSeed);
-          }}, 4000);
-        }}
-      }}
-    }};
-  }}
-  window.onload = connectWS;
-</script>
-</body>
-</html>
-"""
-    return HTMLResponse(html)
-'''
-with open("server/app.py", "r", encoding="utf-8") as f:
-    content = f.read()
-# find @app.get("/", response_class=HTMLResponse)
-target = '@app.get("/", response_class=HTMLResponse)'
-if target in content:
-    new_content = content.replace(target, html_content + "\n\n" + target)
-    with open("server/app.py", "w", encoding="utf-8") as f:
-        f.write(new_content)
-    print("SUCCESS")
-else:
-    print("NOT FOUND")

technical_reference.md DELETED Viewed

@@ -1,106 +0,0 @@
-# ARIA: DevOps Incident Response – Technical Reference Manual
-━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
-SECTION 1: PROJECT OVERVIEW
-━━━━━━━━━━━━━━━━━━━━━━━━━━━
-**Project**: ARIA (DevOps Incident Response)
-**Purpose**: An OpenEnv-compliant RL environment where AI agents diagnose and remediate production software incidents across a simulated microservices architecture. Designed for the Meta × PyTorch × HuggingFace OpenEnv Hackathon finals.
-**Architecture Stack**:
-- **Framework**: FastAPI (Python)
-- **State Management**: In-memory `DevOpsEnvironment`. Websocket support available for real-time streaming.
-- **Core Data Models**: Pydantic (`Action`, `Observation`, `State`, `StepResult`)
-- **Protocol**: JSON over REST (`/reset`, `/step`, `/state`, `/validate`, `/multi-agent/*`, `/curriculum/*`)
-- **Deployment**: Hugging Face Spaces (`server/app.py` is the verified production entrypoint).
-━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
-SECTION 2: CORE ENVIRONMENT & ACTION SPACE
-━━━━━━━━━━━━━━━━━━━━━━━━━━━
-The environment supports standard API and agent interactions. SLA degradation kicks in every step (if a service is `down`, error rates creep up; if `degraded`, latency increases).
-**Action Types & Side Effects**:
-- `read_logs(service)`: Returns the last 2 lines + a summary of hidden lines.
-- `search_logs(service, query)`: Case-insensitive search on logs.
-- `read_metrics(service)`: Returns CPU, memory, error rate, p99 latency, replicas, version, and SLA breach info.
-- `read_runbook(runbook)`: Loads Markdown text from the `data/runbooks/` directory.
-- `acknowledge(service)`: Acknowledges an active alert ID.
-- `diagnose(root_cause)`: Evaluates keyword overlap for a diagnosis bonus reward.
-- `restart_service(service)`: Fixes OOMs. Penalised if used on stateful services, unaffected services, or repeated.
-- `scale_up(service)`: Increases replicas. Penalised if data corruption or unrelated.
-- `rollback(service, version)`: Reverts a bad deployment to fix cascading failures.
-- `alert_oncall(reason)`: Required for cross-team fixes (data audit, security escalation, DBA intervention).
-- `block_ip_range(ip_range)`: Security response mechanism for DDoS attacks.
-- `create_index(table, column)`: DBA response for slow database queries.
-- `failover(target_region)`: Fails over eligible stateless services to `us-west-2`.
-- `noop()`: Take no action. Penalised if used excessively (> 3 times).
-━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
-SECTION 3: THE 7 CORE TASKS (PLUS GENERATED)
-━━━━━━━━━━━━━━━━━━━━━━━━━━━
-| Task ID | Max Steps | Description & Difficulty | Ground Truth Root Cause | Ground Truth Fix |
-|---|---|---|---|---|
-| `easy` | 15 | **Single Service OOM**: One service crash-loops from a memory leak. | `memory_leak_{service}` | `restart {service}` |
-| `medium` | 20 | **Cascading Failure**: Bad deployment exhausts connection pools, cascading. Includes a red-herring alert. | `connection_pool_exhaustion` or `null_pointer` | `rollback {service}` |
-| `hard` | 25 | **Silent Data Corruption**: All services green. No error alerts. Requires correlating subtle business metrics. | `data_corruption_data_pipeline...` | `rollback data-pipeline` AND `alert_oncall` |
-| `bonus` | 25 | **Dual Simultaneous Failure**: Two independent failures at once. Both must be fixed. | `disk_full_log... AND model_reload_loop...` | `alert_oncall` (disk) AND `rollback` (ml) |
-| `security` | 20 | **DDoS Attack**: Botnet credential stuffing. Requires blocking CIDR and escalation. | `ddos_attack_185.x.x.x...` | `block_ip_range` AND `alert_oncall` |
-| `database` | 20 | **DB Degradation**: Missing schema index causing full table scans. | `missing_index_orders_user_segment...` | `create_index` or `rollback` |
-| `failover` | 25 | **Multi-Region Failover**: Network partition. Fails over stateless services. | `us_east_1_network_partition...` | `failover` eligible AND `alert_oncall` others |
-| `generated` | 20 | **Procedural Incident**: A seed-based deterministic incident generated by ARIA. | (Deterministic via Seed) | (Varies by failure mode) |
-━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
-SECTION 4: REWARD SHAPING & GRADING LOGIC
-━━━━━━━━━━━━━━━━━━━━━━━━━━━
-The evaluation logic (`graders/grader.py`) calculates the final episode score, strictly bounded to `[0.001, 0.999]` to pass OpenEnv validation checks.
-- **Base Score**: The accumulated `total_reward` from individual step functions (clamped to `[0.0, 1.0]`).
-- **Efficiency Bonus**: If the incident is resolved, `+ (1.0 - (steps_taken / max_steps)) * 0.05`.
-- **Diagnosis Precision Bonus**: Checks keyword overlap of the `diagnose` action against the ground truth. `>= 50%` overlap adds `+0.03`. `>= 30%` overlap adds `+0.01`.
-- **Noop Penalty**: `(noop_count - 3) * 0.02` for excessive `noop` actions.
-- **Restart Penalty**: `(restarts - 1) * 0.05` per service restarted more than once (discourages guess-and-check).
-- **Blind Remediation Penalty** (`tasks/base.py`): `-0.05` applied locally in step functions if a fix action is taken before any `diagnose_correct` or `diagnose_partial` is awarded.
-━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
-SECTION 5: ADVANCED SUB-SYSTEMS
-━━━━━━━━━━━━━━━━━━━━━━━━━━━
-### Curriculum Engine (`curriculum/engine.py`)
-- **Mastery Tracker**: Keeps a rolling average of the last 5 scores per task.
-- **Mastery Levels**: Novice (0) ➔ Intermediate (1) ➔ Advanced (2) ➔ Mastered (3).
-- **Thresholds**: Rolling Avg `> 0.75` promotes mastery. `< 0.30` demotes mastery.
-- **Scaffolding**: Provides specific hints if a task is failed `>= 3` times and avg is `< 0.30`.
-- **Next Task Logic**: Returns the task with the lowest rolling average among non-mastered candidates.
-### Multi-Agent Dual Session (`multi_agent/session.py`)
-- **Agent A (Observer)**: Read-only. Prompted to review logs and alerts. Must use `share_finding` to pass observations.
-- **Agent B (Responder)**: Write-only. Relies on Agent A's findings plus service metrics to execute remediation actions.
-- **Endpoints**: `/multi-agent/reset`, `/multi-agent/step/a/{id}`, `/multi-agent/step/b/{id}`, `/multi-agent/state/{id}`.
-### Procedural Incident Generator (`generator/incident_factory.py`)
-- **Mechanics**: Takes a `seed` to build an incident dynamically. Supports 6 failure modes (`oom`, `cascade`, `corruption`, `security`, `database`, `network_partition`).
-- **Noise injection**: Adds 0-3 random noise alerts (e.g., SSL renewals, scheduled batch jobs) to distract agents.
-- **Difficulty Score Calculation**: `base_score + (noise_count * 0.05)`, clamped to 1.0.
-━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
-SECTION 6: API ENDPOINTS
-━━━━━━━━━━━━━━━━━━━━━━━━━━━
-| Method | Route | Description |
-|---|---|---|
-| `GET` | `/health` | Simple `{status: "ok"}` liveness check. |
-| `GET` | `/tasks` | Lists all 8 tasks with descriptions and max steps. |
-| `POST` | `/reset` | Initializes an episode. Body: `{"task_id": str, "seed": int}`. |
-| `POST` | `/step` | Executes an `Action`. Returns a `StepResult`. |
-| `GET` | `/state` | Full state dump including ground truth logic. |
-| `GET` | `/validate` | Runs self-validation on the 7 core tasks via random-agent rollout. |
-| `GET` | `/metrics` | Telemetry: Resolution rates, score averages per task. |
-| `GET` | `/leaderboard` | Top 10 episodes ranked by score, then fewest steps. |
-| `GET` | `/curriculum/*` | Next recommended task, status, and scaffolding hints. |
-| `GET` | `/generate/preview` | Preview procedurally generated incident structure for a specific seed. |
-━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
-SECTION 7: LIMITATIONS & KNOWN ISSUES (DEMO GOTCHAS)
-━━━━━━━━━━━━━━━━━━━━━━━━━━━
-- **Validation Route Exception**: The `generated` task is intentionally excluded from the `/validate` endpoint (`VALID_TASKS` in `server/app.py`). This is because the random validation agent frequently fails its variable mechanics, leading to a false "failed" validation status.
-- **State Model Seed Loss**: The integer `seed` is dropped from the finalised `State` Pydantic model response. It defaults to `42` in telemetry records (`track_episode`) if not explicitly extracted.
-- **Entrypoint Synchronization**: The primary Hugging Face production entrypoint is `server/app.py`. Do NOT use `api.py` for live deployments. If 404 errors appear, ensure routes were ported to `app.py`.
-- **Security Blocks**: Hugging Face repository tokens (`HF_TOKEN`) must be managed via Secrets. Hardcoded tokens in `devops.ipynb` were actively scrubbed to prevent automated security takedowns.

uvicorn_err.txt DELETED Viewed

File without changes

uvicorn_out.txt DELETED Viewed

File without changes