Spaces:

mahithakur
/

PRobe

Runtime error

App Files Files Community

Thakur, Mahipal commited on Apr 24

Commit

44bd7bd

1 Parent(s): 4ec7361

UI Integration

Browse files

Files changed (9) hide show

README.md +403 -4
docs/design.md +1 -1
environment/app.py +36 -12
frontend/app.js +597 -0
frontend/index.html +212 -0
frontend/style.css +391 -0
outputs/baseline_comparison.svg +98 -0
outputs/reward_breakdown.svg +95 -0
run.py +65 -0

README.md CHANGED Viewed

@@ -6,7 +6,7 @@ colorTo: green
 sdk: docker
 pinned: false
 app_port: 8000
-base_path: /web
 tags:
   - openenv
   - code-review
@@ -197,11 +197,23 @@ Find missing rate-limit    →  nginx config shown   →  confirms /auth fully e
 ## Quickstart
 ```bash
-# Install
 uv sync
-# Run the environment server
-uv run uvicorn environment.app:app --host 0.0.0.0 --port 8000 --reload
 # Run zero-shot GPT-4o-mini baseline (requires OPENAI_API_KEY)
 export OPENAI_API_KEY=sk-...
@@ -213,6 +225,143 @@ uv run python training/train_grpo.py --test
 ---
 ## Training
 | | |
@@ -283,6 +432,256 @@ Security code review is a high-stakes task performed by a small number of specia
 ## Repo Structure
 ```
 .
 ├── agent/

 sdk: docker
 pinned: false
 app_port: 8000
+base_path: /ui/
 tags:
   - openenv
   - code-review
 ## Quickstart
 ```bash
+# 1. Install all dependencies
 uv sync
+# 2. Start the server + frontend in one command
+uv run python run.py
+# The terminal will print:
+# ==========================================================
+#   PRobe — AI Code Review Training Environment
+# ==========================================================
+#   Frontend   →  http://localhost:8000/ui/
+#   API docs   →  http://localhost:8000/docs
+#   WebSocket  →  ws://localhost:8000/ws
+# ==========================================================
+# 3. Open your browser
+open http://localhost:8000/ui/
 # Run zero-shot GPT-4o-mini baseline (requires OPENAI_API_KEY)
 export OPENAI_API_KEY=sk-...
 ---
+## Interactive Frontend Dashboard
+PRobe ships with a **zero-dependency browser UI** that turns the RL environment into a live, interactive demo.
+No npm, no build step — just start the server and open your browser.
+### What It Looks Like
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│  🔍 PRobe   Adversarial Code Review — RL Training Environment               │
+│                                                    🟢 Connected  [New Ep]   │
+├──────────────────────────────┬───────────────────┬─────────────────────────┤
+│  Task 2 — auth.py            │     Actions       │   Reward Dashboard      │
+│  medium  •  Step 3 / 20      │                   │                         │
+│                              │  💬 Add Comment   │        ◯  +0.24         │
+│  ⚠️ External contributor,    │  ┌──────────────┐ │      cumulative         │
+│     no prior commit history  │  │ Line:  [12]  │ │                         │
+│                              │  │ Comment:     │ │  Issue credit  ████░░  │
+│  Review this auth module.    │  │ SQL inject.. │ │  Classification ██░░░  │
+│  Identify bugs and decide    │  │ Severity:    │ │  FP penalty    ░░░░░  │
+│  whether to escalate or      │  │ [critical ▾] │ │  Coverage      ███░░  │
+│  request changes.            │  │ Category:    │ │  Decision      ████░  │
+│                              │  │ [security ▾] │ │  Efficiency    ██░░░  │
+│  ┌─ auth.py ──────────────┐  │  └──────────────┘ │                         │
+│  │   1: import hashlib    │  │  [Submit Comment] │  Issues Found           │
+│  │   2:                   │  │                   │  ██████░░░░  2 / 5      │
+│  │   3: DB_PASS = "s3cr"  │  │  ⚡ Quick Actions │                         │
+│  │  12: cursor.execute(   │◄─┤  [🔍 Get Context] │  Episode History        │
+│  │     f"SELECT * FROM    │  │  [🤖 Run Scanner] │  ┌───────────────────┐  │
+│  │     users WHERE        │  │  ───────────────  │  │ ADD_COMMENT +0.12 │  │
+│  │  13: username='{u}'"   │  │  [🔄 Req Changes] │  │ sql injection L12 │  │
+│  │  14: )                 │  │  [✅ Approve PR]  │  ├───────────────────┤  │
+│  └────────────────────────┘  │  [📤 Submit]      │  │ RUN_SCANNER +0.00 │  │
+│                              │  [🚨 Escalate]    │  │ 3 findings found  │  │
+└──────────────────────────────┴───────────────────┴─────────────────────────┘
+```
+### Three-Column Layout
+**Left — Code Viewer**
+- Full source code with **line numbers** for every episode
+- Lines are **colour-coded** as you act:
+  - 🔵 Blue — line you just commented on
+  - 🟡 Yellow — line flagged by the scanner
+  - 🟢 Green — line you probed with Get Context
+- **Unlocked hints** appear below the code as green panels whenever a key issue is found
+- The **adversarial hint** banner tells you whether the PR is from a trusted team member or an unknown external contributor
+**Centre — Action Panel**
+- **Add Comment** form: line number, free-text comment, severity, category, and bug/backdoor classification
+- **Quick Actions**: single-click buttons for all 7 action types
+| Button | Action | What Happens |
+|---|---|---|
+| 🔍 Get Context | `get_context` | Reveals ±5 lines around the probed line number |
+| 🤖 Run Scanner | `run_scanner` | Runs the simulated static-analysis tool |
+| 🔄 Request Changes | `request_changes` | Records your review decision |
+| ✅ Approve PR | `approve` | Approves (−0.15 penalty if < 50 % issues found) |
+| 📤 Submit Review | `submit_review` | Ends the episode; triggers terminal scoring |
+| 🚨 Escalate to Security | `escalate_to_security_review` | Correct only on adversarial tasks 7–9 |
+**Right — Reward Dashboard**
+- **Animated ring** showing cumulative episode reward (green above zero, red below)
+- **Six component bars** updating in real time after every action:
+  - Issue credit, Classification credit, FP penalty
+  - Coverage bonus, Decision score, Efficiency bonus
+- **Issues progress bar** showing how many ground-truth issues you have found
+- **Episode history feed** — every action with its reward delta and explanation
+### Episode End Modal
+When the episode terminates (via Submit Review or Escalate), a modal pops up showing:
+```
+        🏆  Episode Passed!
+  "Found 5/5 issues (weighted coverage 100%).
+   Decision 'escalate_to_security_review' was correct."
+  ┌───────────────────────────────────┐
+  │ Cumulative reward      +0.874     │
+  │ Issues found           5 / 5      │
+  │ Steps used             18 / 25    │
+  │ Decision               escalate   │
+  │ Escalation required    Yes        │
+  └───────────────────────────────────┘
+              [Start New Episode]
+```
+Clicking **Start New Episode** automatically loads the next task in the difficulty ladder.
+### How to Run
+```bash
+# Install dependencies (one-time)
+uv sync
+# Start the server — this also serves the frontend
+uv run python run.py
+```
+Then open **`http://localhost:8000/ui/`** in any browser. No additional setup, no separate frontend server.
+**Optional flags:**
+```bash
+# Different port
+uv run python run.py --port 9000
+# Bind to localhost only (do not expose on the network)
+uv run python run.py --host 127.0.0.1
+# Dev mode: auto-reload Python files on save
+uv run python run.py --reload
+```
+### How the Frontend Connects
+The browser communicates with the backend over a **persistent WebSocket** at `ws://localhost:8000/ws`.
+Each browser tab gets its own isolated environment instance — concurrent sessions do not share state.
+The WebSocket URL is auto-detected from `window.location.hostname` so the UI works on any host or port without editing any file.
+### Why a Frontend Helps the Story
+| Without Frontend | With Frontend |
+|---|---|
+| `total=0.345` in a log file | Animated reward ring filling green in real time |
+| `issues_found: ['sql_injection']` | Line 12 highlighted blue in the code viewer |
+| `decision: escalate_to_security_review` | 🚨 Escalate button, modal with final score and stats |
+| Understanding the anti-exploit rule | Watching a keyword-spam comment score −0.05 FP penalty |
+| Explaining the causal chain mechanic | Green hint panel appearing after finding the JWT issue |
+The dashboard makes the reward signal **tangible** — a visitor can play one episode in two minutes and immediately understand what makes PRobe different from a linter.
+---
 ## Training
 | | |
 ## Repo Structure
+```
+.
+├── agent/
+│   ├── client.py               # HTTP client for interacting with the environment server
+│   ├── models.py               # Pydantic models: ProbeAction, ProbeObservation, RewardType
+│   └── __init__.py
+├── environment/
+│   ├── app.py                  # FastAPI server (HTTP + WebSocket + static frontend at /ui/)
+│   ├── Dockerfile              # Container definition for HuggingFace Spaces
+│   ├── episode_memory.py       # Cross-episode JSON memory (injects prior-finding hints)
+│   ├── graders.py              # Deterministic reward grader (keyword+line+length verifier)
+│   ├── mutator.py              # Code mutation engine (rename / shift / nudge)
+│   ├── probe_environment.py    # Core environment: reset / step / state / action handlers
+│   ├── requirements.txt        # Server-side Python dependencies
+│   ├── scanner.py              # Simulated static-analysis tool (70% recall, FP injection)
+│   ├── tasks.py                # 10 task definitions with ground-truth issue lists
+│   ├── _import_compat.py       # Import shim for package / script / test contexts
+│   └── __init__.py
+├── frontend/
+│   ├── index.html              # Three-column dashboard layout
+│   ├── style.css               # Dark IDE theme (no build step required)
+│   └── app.js                  # WebSocket client, code viewer, reward ring, history feed
+├── training/
+│   ├── baseline.py             # Zero-shot GPT-4o-mini baseline agent + plotting
+│   ├── scripted_baseline.py    # Deterministic oracle and spammer stress-tests
+│   ├── train_grpo.py           # GRPO training script (TRL + optional Unsloth, 5-phase curriculum)
+│   └── __init__.py
+├── tests/
+│   ├── test_dynamic_world.py   # Tests for mutation engine and scanner noise model
+│   ├── test_grader.py          # Tests for reward grader correctness
+│   └── __init__.py
+├── docs/
+│   └── design.md               # Architecture notes
+├── outputs/
+│   └── scripted_baseline.jsonl # Sample baseline results
+├── run.py                      # One-command launcher: starts server + serves frontend
+├── openenv.yaml                # OpenEnv manifest (10 tasks, full schema)
+├── pyproject.toml              # Project metadata and dependencies
+└── pytest.ini                  # Test configuration
+```
+---
+## OpenEnv Compliance Checklist
+- [x] Built on `Environment` base class (`ProbeEnvironment(Environment)` in `environment/probe_environment.py`)
+- [x] `reset()`, `step()`, `state()` all implemented (async-native via `async_reset` / `async_step` / `async_state`; sync wrappers delegate safely via `asyncio.run`)
+- [x] `step()` returns `tuple[ObservationType, RewardType, bool, dict]` (see `async_step` in `probe_environment.py`)
+- [x] Dedicated `RewardType` Pydantic v2 model with `model_config = ConfigDict(frozen=True)` (`agent/models.py`)
+- [x] Valid `openenv.yaml` manifest (spec_version, name, type, runtime, app, port, 10 tasks, observation schema)
+- [x] Client/server separation enforced (`agent/` = client models + HTTP client; `environment/` = server logic)
+- [x] No reserved MCP tool names used
+- [ ] Hosted on HuggingFace Spaces ([FILL: deploy and add URL to links table above])
+---
+## The Problem
+The XZ Utils backdoor (CVE-2024-3094) slipped through two years of open-source review. SolarWinds compromised 18,000 organisations via a tampered build pipeline. In both cases the malicious change *looked* like a legitimate contribution — the kind of PR that lands in a code-review queue every day.
+Today's LLMs scan code like a linter. They find style issues, flag known CVE patterns, and produce plausible-sounding comments. What they don't do is *investigate* — reason about intent, distinguish an honest off-by-one from a planted authentication bypass, or know when to escalate rather than request changes. Reward signals for code generation are everywhere; reward signals for critical code *evaluation* barely exist.
+PRobe closes that gap. Its fully deterministic grader — keyword + line-range matching, no LLM judge — separates investigation quality from keyword spam. An agent that dumps every security term at random lines scores *negative*. One that reads carefully, probes for context, finds the right lines, and correctly labels each flaw as an honest bug or a deliberate backdoor scores close to `+1.0`.
+---
+## What the Agent Sees, Does, and Gets Rewarded For
+### Plain English
+The agent is handed a Python source file and asked to review it like a senior security engineer. It can annotate suspicious lines, probe specific regions for more context, run a simulated scanner (which, like real tools, misses things and occasionally lies), and finally submit a verdict. On adversarial tasks it must also decide whether the code contains a deliberate backdoor and escalate to a security team if so. Every episode the code surface changes — variable names, line numbers, constants — so the agent cannot memorise answers; it has to read.
+### What the Agent Observes (`ProbeObservation`)
+| Field | Description |
+|---|---|
+| `code_snippet` | Mutated Python source for this episode |
+| `task_description` | Review instructions and goals |
+| `file_name` | Name of the file being reviewed |
+| `task_id` / `task_difficulty` | Current task index (0–9) and difficulty label |
+| `review_history` | All actions taken so far this episode |
+| `step_count` / `max_steps` | Steps used vs. budget |
+| `issues_found_count` / `total_issues` | Progress tracker |
+| `context_hints` | Causal hints unlocked by finding key issues |
+| `reward` | Most recent step reward in `[-1.0, 1.0]` |
+| `done` | Whether the episode has ended |
+### What Actions the Agent Can Take (`ProbeAction`)
+| Action | Effect |
+|---|---|
+| `add_comment` | Annotate a line with text, severity, category, and optional backdoor classification |
+| `get_context` | Reveal ±5 lines of context around a chosen line number |
+| `run_scanner` | Invoke simulated static-analysis tool (70 % recall, up to 2 false positives injected) |
+| `request_changes` | Mark PR as requiring fixes (correct terminal action for tasks 0–6) |
+| `approve` | Approve the PR (penalised if issues remain) |
+| `submit_review` | Finalise the review and end the episode |
+| `escalate_to_security_review` | Flag PR as containing a deliberate attack (required for tasks 7–9) |
+### Reward Formula
+Reward accumulates across steps and is finalised at submission:
+```
+Episode reward =
+  Σ per-comment (ADD_COMMENT):
+    issue_credit          = (weight_i / total_weight) × 0.40   ← found a real issue
+    classification_credit = (weight_i / total_weight) × 0.20   ← correct bug/backdoor label
+    misclassify_penalty                               = −0.05   ← found it but labelled it wrong
+    false_positive_penalty                            = −0.05   ← substantive comment, no issue matched
+  + on terminal (SUBMIT_REVIEW or ESCALATE):
+    coverage_bonus   = weighted_coverage × 0.15                 ← proportional to issues found
+    decision_score   = +0.15 / −0.15                            ← correct / wrong final action
+                       (bonus gated: requires coverage ≥ 30 %)
+    efficiency_bonus = (1 − steps_used/max_steps) × 0.10        ← unlocked only if coverage ≥ 60 %
+Maximum achievable: ~1.0   Minimum: −1.0
+```
+### Anti-Exploit Verifier
+A comment earns `issue_credit` only when **all three** conditions hold simultaneously:
+1. **`keyword_hit`** — at least one issue keyword appears in the comment text
+2. **`line_hit`** — `line_number` is within ±2 lines of the declared issue range
+3. **`substantive`** — comment body is longer than 15 characters
+This closes three common reward-hacking paths: keyword spam (fails `line_hit`), wide-net line fishing (fails `keyword_hit`), and one-word dumps (fails `substantive`). The decision bonus additionally requires weighted coverage ≥ 30 % before it can be earned, so an agent that never reads code and always guesses `request_changes` earns zero — not a bonus.
+### Perfect Episode vs. Failing Episode
+**Perfect:** The agent reads the code, annotates every real issue at the correct line with a substantive, keyword-bearing comment, correctly labels each as `accidental_bug` or `intentional_backdoor`, escalates when required, and submits with steps to spare. Score approaches `1.0`.
+**Failing:** The agent spams generic comments on random lines, never co-locates a keyword with a real issue line, triggers false-positive penalties on every step, and submits the wrong terminal action. Score approaches `−1.0`.
+---
+## Environment Design
+### Difficulty Tiers
+| Tier | Tasks | Max Steps | Issues | What Changes |
+|---|---|---|---|---|
+| **Ultra-Easy** (bootstrap) | 0 | 6 | 2 | Category names spelled out in code comments above each bug; guarantees GRPO positive trajectories from step 1 |
+| **Easy** | 1 | 15 | 3 | Clean logic bugs, no hints, no distractors |
+| **Medium** | 2, 4 | 20 | 5 | Security and async-concurrency issues; wider surface area |
+| **Hard** | 3, 5, 6 | 30–35 | 6–7 | Mixed bug classes; task 6 adds a causal unlock chain |
+| **Adversarial** | 7, 8, 9 | 25–35 | 2–3 | Deliberate backdoors requiring `intentional_backdoor` classification and `escalate_to_security_review` |
+### Tasks (10 total)
+| ID | Difficulty | File | Issues | Domain |
+|---|---|---|---|---|
+| 0 | Ultra-easy | `bootstrap.py` | 2 | Off-by-one + hardcoded credential (hinted in comments) |
+| 1 | Easy | `utils.py` | 3 | Logic bugs, off-by-one, dead code |
+| 2 | Medium | `auth.py` | 5 | SQL injection, MD5, eval(), hardcoded creds |
+| 3 | Hard | `data_pipeline.py` | 7 | N+1, SSL bypass, thread leak, OOM cache |
+| 4 | Medium | `async_worker.py` | 5 | Race condition, missing await, resource leak |
+| 5 | Hard | `api_server.py` | 6 | Command injection, path traversal, pickle RCE |
+| 6 | Hard | `auth_service.py` | 6 | **Causal chain** — JWT forgery → privilege escalation |
+| 7 | Adversarial | `session_manager.py` | 2 | Planted auth bypass + hardcoded bypass credential |
+| 8 | Adversarial | `jwt_validator.py` | 2 | **Compound backdoor** — fixed-seed JWT secret + timing oracle |
+| 9 | Adversarial | `compat_shim.py` | 3 | **Supply chain** — import-time env-var exfiltration to attacker domain |
+### GRPO Curriculum (5 Phases in `train_grpo.py`)
+| Phase | Steps | Tasks in Pool |
+|---|---|---|
+| 0 | 0 – 40 | 0–1 (ultra-easy / easy) |
+| 1 | 40 – 80 | 0–3 (adds medium / hard) |
+| 2 | 80 – 120 | 0–6 (adds causal chain) |
+| 3 | 120 – 160 | 0–8 (adds adversarial) |
+| 4 | 160 – 200 | 0–9 (full curriculum) |
+### Reward Components with Weights
+| Component | Weight | Trigger |
+|---|---|---|
+| `issue_credit` | up to **0.40** cumulative | `add_comment` matches a real issue (keyword + line + length) |
+| `classification_credit` | up to **0.20** cumulative | correct `accidental_bug` / `intentional_backdoor` label |
+| `misclassify_penalty` | **−0.05** per issue | issue found but wrong classification label |
+| `false_positive_penalty` | **−0.05** per comment | substantive comment, zero issues matched |
+| `coverage_bonus` | up to **0.15** terminal | `weighted_coverage × 0.15` |
+| `decision_score` | **±0.15** terminal | correct / wrong `request_changes` vs `escalate` decision |
+| `efficiency_bonus` | up to **0.10** terminal | `(1 − steps/max_steps) × 0.10` when coverage ≥ 60 % |
+| `format_bonus` | **+0.02** once | response contains a valid non-empty JSON array |
+### Dynamic World (Anti-Memorisation)
+Each episode `mutate_task()` applies three seed-controlled transforms:
+| Mutation | Example |
+|---|---|
+| Variable rename | `total` → `acc`, `data` → `payload`, `password` → `passwd` |
+| Line shift | Blank line inserted above first issue; all `line_range` values shift +1 |
+| Constant variance | `range(len(data) + 1)` → `range(len(data) + 2)` |
+Mutations are deterministic given the episode seed — reproducible runs, always fresh surfaces.
+### Scanner Noise Model (`scanner.py`)
+`run_scanner()` simulates a real lint/security tool:
+- **Recall: 70 %** — each real issue is reported with probability 0.70; ~30 % silently missed
+- **False-positive rate: 40 %** — up to 2 injected plausible-but-wrong findings per run
+- Scanner output is **not auto-graded** — the agent must still call `add_comment` with a correct line + keyword to earn reward
+### Causal Unlock Chain (Task 6)
+Finding certain issues appends new context hints to the observation, modelling real investigations where one discovery leads to a deeper one:
+```
+Find hardcoded JWT secret  →  DB schema revealed  →  agent can reason: forge token → privilege escalation
+Find missing rate-limit    →  nginx config shown   →  confirms /auth fully exposed with no IP filtering
+```
+### OpenEnv Interface
+| Method | Returns | Notes |
+|---|---|---|
+| `reset()` | `ProbeObservation` | Starts new episode; advances task cursor; applies mutation |
+| `step(action)` | `(ProbeObservation, RewardType, bool, dict)` | Executes action; returns obs, structured reward, done flag, info dict |
+| `state` (sync property) | `State(episode_id, step_count)` | Lightweight snapshot for `create_app` |
+| `async_state()` | `dict` | Full async snapshot with all episode fields |
+---
+## Quickstart
+```bash
+# Install
+uv sync
+# Run the environment server
+uv run uvicorn environment.app:app --host 0.0.0.0 --port 8000 --reload
+# Run zero-shot GPT-4o-mini baseline (requires OPENAI_API_KEY)
+export OPENAI_API_KEY=sk-...
+uv run python training/baseline.py
+# Smoke-test reward function (no GPU, no API key)
+uv run python training/train_grpo.py --test
+```
+---
+## Repo Structure
 ```
 .
 ├── agent/

docs/design.md CHANGED Viewed

@@ -17,7 +17,7 @@ repo-root/
 ## Environment entry point
-`environment/app.py` — FastAPI app mounted at `/web`.
 `openenv.yaml` → `app: environment.app:app`.
 ## Reward function

 ## Environment entry point
+`environment/app.py` — FastAPI app mounted at `/ui/` (static frontend) and `/docs` (API).
 `openenv.yaml` → `app: environment.app:app`.
 ## Reward function

environment/app.py CHANGED Viewed

@@ -21,12 +21,15 @@ from __future__ import annotations
 import json
 import logging
 from contextlib import asynccontextmanager
 from typing import Any
 import uvicorn
 from fastapi import FastAPI, HTTPException, WebSocket, WebSocketDisconnect
 from fastapi.responses import HTMLResponse
 try:
     from openenv.core.env_server.http_server import create_app as _create_openenv_app
@@ -37,7 +40,7 @@ except Exception:  # pragma: no cover
 try:
     from ..agent.models import ProbeAction, ProbeObservation, RewardType
     from .probe_environment import ProbeEnvironment
-except ModuleNotFoundError:
     from agent.models import ProbeAction, ProbeObservation, RewardType  # type: ignore
     from environment.probe_environment import ProbeEnvironment  # type: ignore
@@ -85,6 +88,11 @@ class StepResponse:
 # ── App factory ───────────────────────────────────────────────────────────────
 def _build_app() -> FastAPI:
     application = FastAPI(
         title="PRobe",
@@ -93,6 +101,15 @@ def _build_app() -> FastAPI:
         lifespan=lifespan,
     )
     # ── HTTP endpoints ────────────────────────────────────────────────────
     @application.post("/reset", summary="Start a new episode")
@@ -175,18 +192,25 @@ def _build_app() -> FastAPI:
             pass
     # ── Web UI ────────────────────────────────────────────────────────────
     @application.get("/web", response_class=HTMLResponse, include_in_schema=False)
-    async def web_ui() -> str:
-        return """
-        <!doctype html><html><head><title>PRobe</title></head>
-        <body style="font-family:sans-serif;padding:2rem">
-        <h2>PRobe Environment</h2>
-        <p>API docs: <a href="/docs">/docs</a></p>
-        <p>Health: <a href="/health">/health</a></p>
-        <p>Schema: <a href="/schema">/schema</a></p>
-        </body></html>
-        """
     return application

 import json
 import logging
+import pathlib
 from contextlib import asynccontextmanager
 from typing import Any
 import uvicorn
 from fastapi import FastAPI, HTTPException, WebSocket, WebSocketDisconnect
+from fastapi.middleware.cors import CORSMiddleware
 from fastapi.responses import HTMLResponse
+from fastapi.staticfiles import StaticFiles
 try:
     from openenv.core.env_server.http_server import create_app as _create_openenv_app
 try:
     from ..agent.models import ProbeAction, ProbeObservation, RewardType
     from .probe_environment import ProbeEnvironment
+except (ImportError, ModuleNotFoundError):
     from agent.models import ProbeAction, ProbeObservation, RewardType  # type: ignore
     from environment.probe_environment import ProbeEnvironment  # type: ignore
 # ── App factory ───────────────────────────────────────────────────────────────
+# Resolve the frontend directory relative to this file so the app works
+# regardless of the working directory it is launched from.
+_FRONTEND_DIR = pathlib.Path(__file__).parent.parent / "frontend"
 def _build_app() -> FastAPI:
     application = FastAPI(
         title="PRobe",
         lifespan=lifespan,
     )
+    # Allow the frontend (served on the same host, any port) to call the API.
+    # In production, restrict allow_origins to the exact frontend URL.
+    application.add_middleware(
+        CORSMiddleware,
+        allow_origins=["*"],
+        allow_methods=["*"],
+        allow_headers=["*"],
+    )
     # ── HTTP endpoints ────────────────────────────────────────────────────
     @application.post("/reset", summary="Start a new episode")
             pass
     # ── Web UI ────────────────────────────────────────────────────────────
+    # /web → redirect so old links still work
     @application.get("/web", response_class=HTMLResponse, include_in_schema=False)
+    async def web_redirect() -> HTMLResponse:
+        return HTMLResponse(
+            '<meta http-equiv="refresh" content="0;url=/ui/">',
+            status_code=200,
+        )
+    # Mount the compiled frontend as a static site at /ui.
+    # Falls back gracefully if the frontend directory has not been built yet.
+    if _FRONTEND_DIR.is_dir():
+        application.mount("/ui", StaticFiles(directory=str(_FRONTEND_DIR), html=True), name="ui")
+        log.info("Frontend mounted at /ui from %s", _FRONTEND_DIR)
+    else:
+        log.warning(
+            "Frontend directory not found at %s — /ui will not be available. "
+            "Run the frontend build or create the 'frontend/' directory.",
+            _FRONTEND_DIR,
+        )
     return application

frontend/app.js ADDED Viewed

	@@ -0,0 +1,597 @@

+/**
+ * PRobe Frontend — WebSocket client & UI controller
+ *
+ * Connects to the backend WebSocket at /ws, drives a full episode
+ * lifecycle: reset → step* → terminal, and renders all state changes
+ * (code viewer, reward bars, history feed, episode-end modal) in real time.
+ *
+ * Architecture
+ * ------------
+ *   WsClient         — thin wrapper around native WebSocket with reconnect
+ *   RewardDashboard  — renders ring, component bars, issues progress
+ *   CodeViewer       — renders syntax-highlighted code with line decorations
+ *   HistoryFeed      — append-only action history list
+ *   ProbeController  — orchestrates all of the above; owns episode state
+ */
+"use strict";
+// ═══════════════════════════════════════════════════════════════════
+// CONFIG
+// ═══════════════════════════════════════════════════════════════════
+const CONFIG = {
+  // WebSocket URL — auto-detects host so the page works on any deployment
+  wsUrl: `ws://${window.location.hostname}:8000/ws`,
+  reconnectDelayMs: 2000,
+  ringCircumference: 314,  // 2π × r=50
+};
+// ═══════════════════════════════════════════════════════════════════
+// WsClient — WebSocket with auto-reconnect
+// ═══════════════════════════════════════════════════════════════════
+class WsClient {
+  /**
+   * @param {string} url            WebSocket endpoint
+   * @param {function} onMessage    Called with parsed JSON message objects
+   * @param {function} onStatusChange Called with ('connected'|'disconnected')
+   */
+  constructor(url, onMessage, onStatusChange) {
+    this._url            = url;
+    this._onMessage      = onMessage;
+    this._onStatusChange = onStatusChange;
+    this._socket         = null;
+    this._connected      = false;
+  }
+  connect() {
+    if (this._socket) this._socket.close();
+    this._socket = new WebSocket(this._url);
+    this._socket.onopen = () => {
+      this._connected = true;
+      this._onStatusChange("connected");
+    };
+    this._socket.onclose = () => {
+      this._connected = false;
+      this._onStatusChange("disconnected");
+    };
+    this._socket.onerror = (err) => {
+      console.error("[WsClient] error:", err);
+      this._connected = false;
+      this._onStatusChange("disconnected");
+    };
+    this._socket.onmessage = (event) => {
+      try {
+        const msg = JSON.parse(event.data);
+        this._onMessage(msg);
+      } catch (e) {
+        console.warn("[WsClient] unparseable message:", event.data);
+      }
+    };
+  }
+  send(payload) {
+    if (!this._connected) {
+      console.warn("[WsClient] send called while disconnected");
+      return;
+    }
+    this._socket.send(JSON.stringify(payload));
+  }
+  get isConnected() { return this._connected; }
+}
+// ═══════════════════════════════════════════════════════════════════
+// CodeViewer — renders code with per-line decorations
+// ═══════════════════════════════════════════════════════════════════
+class CodeViewer {
+  constructor(preEl) {
+    this._pre = preEl;
+    this._lines = [];
+    // Track which lines have active highlights so we can clear them
+    this._decoratedLines = new Set();
+  }
+  /**
+   * Render source code as numbered, individually addressable lines.
+   * Clears any previous decorations.
+   */
+  render(sourceCode) {
+    this._lines = sourceCode.split("\n");
+    this._decoratedLines.clear();
+    this._pre.innerHTML = this._lines.map((text, idx) => {
+      const lineNum = idx + 1;
+      return `<span class="code-line" id="cl-${lineNum}">`
+           + `<span class="code-line-num">${lineNum}</span>`
+           + escapeHtml(text)
+           + `</span>`;
+    }).join("\n");
+  }
+  /**
+   * Apply a CSS class to a specific line.
+   * @param {number} lineNumber   1-based
+   * @param {string} cssClass     e.g. 'hl-comment'
+   */
+  decorateLine(lineNumber, cssClass) {
+    const el = document.getElementById(`cl-${lineNumber}`);
+    if (!el) return;
+    // Remove any previous highlight class on this line before adding the new one
+    el.classList.remove("hl-comment", "hl-issue", "hl-scanner", "hl-context");
+    el.classList.add(cssClass);
+    this._decoratedLines.add(lineNumber);
+  }
+  /** Scroll the given 1-based line number into view. */
+  scrollToLine(lineNumber) {
+    const el = document.getElementById(`cl-${lineNumber}`);
+    if (el) el.scrollIntoView({ block: "center", behavior: "smooth" });
+  }
+  clearDecorations() {
+    for (const lineNum of this._decoratedLines) {
+      const el = document.getElementById(`cl-${lineNum}`);
+      if (el) el.classList.remove("hl-comment", "hl-issue", "hl-scanner", "hl-context");
+    }
+    this._decoratedLines.clear();
+  }
+}
+// ═══════════════════════════════════════════════════════════════════
+// RewardDashboard — ring + bars + issues progress
+// ═══════════════════════════════════════════════════════════════════
+class RewardDashboard {
+  constructor() {
+    this._ringTrack     = document.getElementById("ring-track");
+    this._ringValue     = document.getElementById("ring-value");
+    this._issuesFill    = document.getElementById("issues-bar-fill");
+    this._issuesLabel   = document.getElementById("issues-found-label");
+    // Component bar element pairs { fill, val }
+    this._bars = {
+      issue_credit:          this._barPair("issue_credit"),
+      classification_credit: this._barPair("classification_credit"),
+      false_positive_penalty:this._barPair("false_positive_penalty"),
+      coverage_bonus:        this._barPair("coverage_bonus"),
+      decision_score:        this._barPair("decision_score"),
+      efficiency_bonus:      this._barPair("efficiency_bonus"),
+    };
+  }
+  _barPair(key) {
+    return {
+      fill: document.getElementById(`bar-${key}`),
+      val:  document.getElementById(`val-${key}`),
+    };
+  }
+  /**
+   * Update the cumulative reward ring.
+   * Clamps input to [-1, 1] and maps to ring arc.
+   */
+  updateRing(cumulativeReward) {
+    const clamped   = Math.max(-1, Math.min(1, cumulativeReward));
+    // Map [-1, 1] → [0, circumference]: negative reward still shows a partial arc
+    const fraction  = (clamped + 1) / 2;
+    const offset    = CONFIG.ringCircumference * (1 - fraction);
+    this._ringTrack.style.strokeDashoffset = offset;
+    // Colour: green above 0, red below
+    this._ringTrack.style.stroke = clamped >= 0 ? "var(--green)" : "var(--red)";
+    this._ringValue.textContent  = clamped.toFixed(2);
+    this._ringValue.style.color  = clamped >= 0 ? "var(--green)" : "var(--red)";
+  }
+  /**
+   * Render per-component score bars from a components dict.
+   * The bar width maps the absolute value to a 0-100% scale capped at 0.40.
+   */
+  updateBars(components) {
+    const MAX_BAR_VALUE = 0.40;
+    for (const [key, pair] of Object.entries(this._bars)) {
+      const rawValue = components[key] ?? 0;
+      const absWidth = Math.min(Math.abs(rawValue) / MAX_BAR_VALUE * 100, 100);
+      pair.fill.style.width   = `${absWidth}%`;
+      pair.val.textContent    = rawValue.toFixed(2);
+      // Positive/negative/neutral colouring
+      pair.fill.classList.remove("positive", "negative", "neutral");
+      if (rawValue > 0)       pair.fill.classList.add("positive");
+      else if (rawValue < 0)  pair.fill.classList.add("negative");
+      else                    pair.fill.classList.add("neutral");
+    }
+  }
+  /** Update the issues-found progress bar. */
+  updateIssues(found, total) {
+    const pct = total > 0 ? (found / total) * 100 : 0;
+    this._issuesFill.style.width  = `${pct}%`;
+    this._issuesLabel.textContent = `${found} / ${total}`;
+  }
+  reset() {
+    this.updateRing(0);
+    this.updateBars({});
+    this.updateIssues(0, 0);
+  }
+}
+// ═══════════════════════════════════════════════════════════════════
+// HistoryFeed — append-only episode action log
+// ═══════════════════════════════════════════════════════════════════
+class HistoryFeed {
+  constructor(containerEl) {
+    this._container = containerEl;
+    this._count = 0;
+  }
+  clear() {
+    this._container.innerHTML = '<div class="history-empty">No actions yet.</div>';
+    this._count = 0;
+  }
+  /**
+   * Append one step to the feed.
+   * @param {string} actionType   Human-readable action label
+   * @param {object} reward       RewardType object from server
+   */
+  append(actionType, reward) {
+    if (this._count === 0) {
+      this._container.innerHTML = "";
+    }
+    this._count++;
+    const total    = reward.total ?? 0;
+    const polarity = total > 0.001 ? "positive" : total < -0.001 ? "negative" : "neutral";
+    const rewardClass = total >= 0 ? "pos" : "neg";
+    const sign        = total >= 0 ? "+" : "";
+    const item = document.createElement("div");
+    item.className = `history-item ${polarity}`;
+    item.innerHTML = `
+      <div>
+        <span class="h-action">${escapeHtml(actionType)}</span>
+        &nbsp;→&nbsp;
+        <span class="h-reward ${rewardClass}">${sign}${total.toFixed(3)}</span>
+      </div>
+      <div class="h-explain">${escapeHtml(reward.explanation ?? "")}</div>
+    `;
+    this._container.prepend(item);   // newest at top
+  }
+}
+// ═══════════════════════════════════════════════════════════════════
+// ProbeController — owns all state, wires UI ↔ WsClient
+// ═══════════════════════════════════════════════════════════════════
+class ProbeController {
+  constructor() {
+    // Sub-components
+    this._ws        = null;
+    this._viewer    = new CodeViewer(document.getElementById("code-block"));
+    this._dashboard = new RewardDashboard();
+    this._feed      = new HistoryFeed(document.getElementById("history-feed"));
+    // Episode state
+    this._episodeActive   = false;
+    this._cumulativeReward = 0;
+    this._stepCount       = 0;
+    this._maxSteps        = 0;
+    this._totalIssues     = 0;
+    this._foundCount      = 0;
+    this._lastObs         = null;
+    this._bindStaticButtons();
+  }
+  // ── Initialisation ──────────────────────────────────────────────
+  _bindStaticButtons() {
+    document.getElementById("btn-connect").addEventListener("click", () => this._connect());
+    document.getElementById("btn-reset").addEventListener("click",   () => this._sendReset());
+    document.getElementById("btn-comment").addEventListener("click", () => this._sendComment());
+    document.getElementById("btn-get-context").addEventListener("click", () => this._sendGetContext());
+    document.getElementById("btn-run-scanner").addEventListener("click", () => this._sendAction("run_scanner"));
+    document.getElementById("btn-request-changes").addEventListener("click", () => this._sendAction("request_changes"));
+    document.getElementById("btn-approve").addEventListener("click", () => this._sendAction("approve"));
+    document.getElementById("btn-submit").addEventListener("click",  () => this._sendAction("submit_review"));
+    document.getElementById("btn-escalate").addEventListener("click",() => this._sendAction("escalate_to_security_review"));
+    document.getElementById("modal-close").addEventListener("click", () => {
+      document.getElementById("modal-overlay").style.display = "none";
+      this._sendReset();
+    });
+  }
+  // ── WebSocket lifecycle ──────────────────────────────────────────
+  _connect() {
+    this._ws = new WsClient(
+      CONFIG.wsUrl,
+      (msg) => this._handleMessage(msg),
+      (status) => this._handleConnectionStatus(status),
+    );
+    this._ws.connect();
+  }
+  _handleConnectionStatus(status) {
+    const badge  = document.getElementById("conn-badge");
+    const btnReset  = document.getElementById("btn-reset");
+    const btnConnect = document.getElementById("btn-connect");
+    if (status === "connected") {
+      badge.textContent = "🟢 Connected";
+      badge.className   = "badge connected";
+      btnConnect.textContent = "Reconnect";
+      btnReset.disabled = false;
+      // Auto-start first episode on successful connect
+      this._sendReset();
+    } else {
+      badge.textContent = "⚫ Disconnected";
+      badge.className   = "badge disconnected";
+      this._setActionButtonsEnabled(false);
+    }
+  }
+  // ── Message dispatch ─────────────────────────────────────────────
+  _handleMessage(msg) {
+    switch (msg.type) {
+      case "reset": this._applyObservation(msg.observation, null, false); break;
+      case "step":  this._applyStep(msg);  break;
+      case "error": this._showError(msg.detail); break;
+      default: console.warn("[ProbeController] unknown message type:", msg.type);
+    }
+  }
+  // ── Episode state application ────────────────────────────────────
+  /**
+   * Apply a fresh observation (after reset or step).
+   * Updates every UI component from the single observation object.
+   */
+  _applyObservation(obs, reward, done) {
+    this._lastObs     = obs;
+    this._stepCount   = obs.step_count;
+    this._maxSteps    = obs.max_steps;
+    this._totalIssues = obs.total_issues;
+    this._foundCount  = obs.issues_found_count;
+    // ── Task metadata ──
+    document.getElementById("task-label").textContent =
+      `Task ${obs.task_id} — ${obs.file_name}`;
+    document.getElementById("task-desc").textContent  = obs.task_description;
+    document.getElementById("steps-counter").textContent =
+      `Step ${obs.step_count} / ${obs.max_steps}`;
+    const diffBadge = document.getElementById("difficulty-badge");
+    diffBadge.textContent = obs.task_difficulty;
+    diffBadge.className   = `difficulty-badge ${obs.task_difficulty.replace(/\s+/g, "-")}`;
+    // ── Adversarial hint ──
+    const advEl = document.getElementById("adv-hint");
+    if (obs.adversarial_hint) {
+      advEl.textContent    = `⚠️ ${obs.adversarial_hint}`;
+      advEl.style.display  = "block";
+    } else {
+      advEl.style.display  = "none";
+    }
+    // ── Code viewer ── (only re-render if code changed, i.e. on reset)
+    if (!reward) {
+      this._viewer.render(obs.code_snippet);
+      this._viewer.clearDecorations();
+    }
+    // ── Highlight lines mentioned in review history ──
+    this._decorateHistoryLines(obs.review_history);
+    // ── Context hints ──
+    this._renderHints(obs.context_hints);
+    // ── Dashboard ──
+    this._cumulativeReward = obs.metadata?.cumulative_reward ?? 0;
+    this._dashboard.updateRing(this._cumulativeReward);
+    this._dashboard.updateIssues(this._foundCount, this._totalIssues);
+    if (reward) {
+      this._dashboard.updateBars(reward.components ?? {});
+      this._feed.append(this._lastActionLabel, reward);
+    }
+    // ── Terminal handling ──
+    if (done) {
+      this._episodeActive = false;
+      this._setActionButtonsEnabled(false);
+      this._showEpisodeEndModal(obs, reward);
+    } else {
+      this._episodeActive = true;
+      this._setActionButtonsEnabled(true);
+    }
+  }
+  _applyStep(msg) {
+    this._applyObservation(msg.observation, msg.reward, msg.done);
+  }
+  // ── Line decorations ─────────────────────────────────────────────
+  /**
+   * Walk review_history and apply colour-coded line highlights.
+   * Later entries overwrite earlier ones on the same line, so the most
+   * recent action's highlight takes priority.
+   */
+  _decorateHistoryLines(history) {
+    this._viewer.clearDecorations();
+    for (const entry of history) {
+      if (!entry.line) continue;
+      let cssClass = "hl-comment";
+      if (entry.type === "scanner_result") continue;      // no single line
+      if (entry.type === "context_probe")  cssClass = "hl-context";
+      if (entry.type === "comment")        cssClass = "hl-comment";
+      this._viewer.decorateLine(entry.line, cssClass);
+    }
+  }
+  // ── Hints ────────────────────────────────────────────────────────
+  _renderHints(hints) {
+    const container = document.getElementById("hints-container");
+    const list      = document.getElementById("hints-list");
+    if (!hints || hints.length === 0) {
+      container.style.display = "none";
+      return;
+    }
+    container.style.display = "block";
+    list.innerHTML = hints.map(h =>
+      `<div class="hint-item">${escapeHtml(h)}</div>`
+    ).join("");
+  }
+  // ── Action senders ───────────────────────────────────────────────
+  _sendReset() {
+    if (!this._ws?.isConnected) return;
+    this._episodeActive = false;
+    this._setActionButtonsEnabled(false);
+    this._dashboard.reset();
+    this._feed.clear();
+    this._viewer._pre.innerHTML = '<span class="placeholder-text">Loading…</span>';
+    document.getElementById("hints-container").style.display = "none";
+    document.getElementById("adv-hint").style.display = "none";
+    this._ws.send({ command: "reset" });
+  }
+  _sendComment() {
+    const line           = parseInt(document.getElementById("inp-line").value, 10) || null;
+    const comment        = document.getElementById("inp-comment").value.trim();
+    const severity       = document.getElementById("inp-severity").value     || null;
+    const category       = document.getElementById("inp-category").value     || null;
+    const classification = document.getElementById("inp-classification").value || null;
+    if (!comment) {
+      alert("Please enter a comment before submitting.");
+      return;
+    }
+    this._lastActionLabel = `ADD_COMMENT (L${line ?? "?"})`;
+    this._sendAction("add_comment", {
+      line_number: line,
+      comment,
+      severity,
+      category,
+      classification,
+    });
+    // Clear comment fields after send
+    document.getElementById("inp-comment").value = "";
+  }
+  _sendGetContext() {
+    const line = parseInt(document.getElementById("inp-probe-line").value, 10) || null;
+    if (!line) { alert("Enter a line number to probe."); return; }
+    this._lastActionLabel = `GET_CONTEXT (L${line})`;
+    this._sendAction("get_context", { line_number: line });
+  }
+  /**
+   * Send a step action to the server.
+   * @param {string} actionType   snake_case action type string
+   * @param {object} extra        Additional fields (line_number, comment, …)
+   */
+  _sendAction(actionType, extra = {}) {
+    if (!this._ws?.isConnected || !this._episodeActive) return;
+    this._lastActionLabel = actionType.toUpperCase().replace(/_/g, " ");
+    this._ws.send({
+      command: "step",
+      action: { action_type: actionType, ...extra },
+    });
+  }
+  // ── UI helpers ───────────────────────────────────────────────────
+  _setActionButtonsEnabled(enabled) {
+    const ids = [
+      "btn-comment", "btn-get-context", "btn-run-scanner",
+      "btn-request-changes", "btn-approve", "btn-submit", "btn-escalate",
+    ];
+    for (const id of ids) {
+      document.getElementById(id).disabled = !enabled;
+    }
+  }
+  _showEpisodeEndModal(obs, reward) {
+    const totalReward = this._cumulativeReward;
+    const passed      = reward?.passed ?? false;
+    document.getElementById("modal-overlay").style.display = "flex";
+    document.getElementById("modal-icon").textContent =
+      totalReward >= 0.5 ? "🏆" : totalReward >= 0 ? "🏁" : "💔";
+    document.getElementById("modal-title").textContent =
+      passed ? "Episode Passed!" : "Episode Complete";
+    document.getElementById("modal-body").textContent =
+      reward?.explanation ?? "Episode ended.";
+    // Render a small stats grid inside the modal
+    const decision  = obs.metadata?.review_decision ?? "—";
+    const esc       = obs.metadata?.escalation_required ? "Yes" : "No";
+    document.getElementById("modal-stats").innerHTML = `
+      <span class="stat-label">Cumulative reward</span>
+      <span class="stat-value">${totalReward.toFixed(3)}</span>
+      <span class="stat-label">Issues found</span>
+      <span class="stat-value">${obs.issues_found_count} / ${obs.total_issues}</span>
+      <span class="stat-label">Steps used</span>
+      <span class="stat-value">${obs.step_count} / ${obs.max_steps}</span>
+      <span class="stat-label">Decision</span>
+      <span class="stat-value">${decision}</span>
+      <span class="stat-label">Escalation required</span>
+      <span class="stat-value">${esc}</span>
+    `;
+  }
+  _showError(detail) {
+    console.error("[ProbeController] server error:", detail);
+    // Non-intrusive: just log and append to feed as a red entry
+    this._feed.append("ERROR", {
+      total: 0,
+      explanation: detail ?? "Unknown server error",
+    });
+  }
+}
+// ═══════════════════════════════════════════════════════════════════
+// Utilities
+// ═══════════════════════════════════════════════════════════════════
+/** Escape HTML special chars to prevent XSS when inserting code/text. */
+function escapeHtml(str) {
+  return String(str)
+    .replace(/&/g, "&amp;")
+    .replace(/</g, "&lt;")
+    .replace(/>/g, "&gt;")
+    .replace(/"/g, "&quot;");
+}
+// ═══════════════════════════════════════════════════════════════════
+// Bootstrap
+// ═══════════════════════════════════════════════════════════════════
+document.addEventListener("DOMContentLoaded", () => {
+  window._probe = new ProbeController();
+});

frontend/index.html ADDED Viewed

	@@ -0,0 +1,212 @@

+<!doctype html>
+<html lang="en">
+<head>
+  <meta charset="UTF-8" />
+  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+  <title>PRobe — AI Code Review Training Environment</title>
+  <link rel="stylesheet" href="style.css" />
+</head>
+<body>
+  <!-- ══════════════════════════════════════════════════════════
+       TOP BAR
+  ══════════════════════════════════════════════════════════ -->
+  <header class="topbar">
+    <div class="topbar-left">
+      <span class="logo">&#x1F50D; PRobe</span>
+      <span class="tagline">Adversarial Code Review — RL Training Environment</span>
+    </div>
+    <div class="topbar-right">
+      <span class="badge" id="conn-badge">⚫ Disconnected</span>
+      <button id="btn-connect" class="btn btn-primary">Connect</button>
+      <button id="btn-reset"   class="btn btn-secondary" disabled>New Episode</button>
+    </div>
+  </header>
+  <!-- ══════════════════════════════════════════════════════════
+       MAIN LAYOUT — three columns
+  ══════════════════════════════════════════════════════════ -->
+  <main class="layout">
+    <!-- ── LEFT: Task meta + code viewer ─────────────────────── -->
+    <section class="panel panel-code">
+      <div class="panel-header">
+        <span id="task-label">Task —</span>
+        <span class="difficulty-badge" id="difficulty-badge">—</span>
+        <span class="steps-counter" id="steps-counter">Step 0 / —</span>
+      </div>
+      <p class="task-desc" id="task-desc">Connect and start an episode to begin.</p>
+      <div class="adversarial-hint" id="adv-hint" style="display:none"></div>
+      <!-- Code block with line-number highlights -->
+      <div class="code-wrapper">
+        <pre id="code-block" class="code-block"><span class="placeholder-text">No code loaded.</span></pre>
+      </div>
+      <!-- Context hints revealed by finding key issues -->
+      <div id="hints-container" style="display:none">
+        <div class="section-title">🔓 Unlocked Context Hints</div>
+        <div id="hints-list" class="hints-list"></div>
+      </div>
+    </section>
+    <!-- ── CENTRE: Action panel ───────────────────────────────── -->
+    <section class="panel panel-action">
+      <div class="panel-header">Actions</div>
+      <!-- ADD_COMMENT form -->
+      <div class="action-card" id="card-comment">
+        <div class="action-title">💬 Add Comment</div>
+        <div class="form-row">
+          <label>Line</label>
+          <input type="number" id="inp-line" min="1" placeholder="e.g. 12" />
+        </div>
+        <div class="form-row">
+          <label>Comment</label>
+          <textarea id="inp-comment" rows="3" placeholder="Describe the issue in detail…"></textarea>
+        </div>
+        <div class="form-row">
+          <label>Severity</label>
+          <select id="inp-severity">
+            <option value="">— none —</option>
+            <option value="info">info</option>
+            <option value="warning">warning</option>
+            <option value="error">error</option>
+            <option value="critical">critical</option>
+          </select>
+        </div>
+        <div class="form-row">
+          <label>Category</label>
+          <select id="inp-category">
+            <option value="">— none —</option>
+            <option value="bug">bug</option>
+            <option value="security">security</option>
+            <option value="performance">performance</option>
+            <option value="style">style</option>
+            <option value="design">design</option>
+          </select>
+        </div>
+        <div class="form-row">
+          <label>Classification</label>
+          <select id="inp-classification">
+            <option value="">— none —</option>
+            <option value="accidental_bug">accidental_bug</option>
+            <option value="intentional_backdoor">intentional_backdoor</option>
+          </select>
+        </div>
+        <button class="btn btn-action" id="btn-comment" disabled>Submit Comment</button>
+      </div>
+      <!-- Quick actions -->
+      <div class="quick-actions">
+        <div class="action-title">⚡ Quick Actions</div>
+        <div class="form-row">
+          <label>Probe Line</label>
+          <input type="number" id="inp-probe-line" min="1" placeholder="e.g. 8" />
+        </div>
+        <button class="btn btn-action btn-info" id="btn-get-context" disabled>🔍 Get Context</button>
+        <button class="btn btn-action btn-info" id="btn-run-scanner"  disabled>🤖 Run Scanner</button>
+        <div class="separator"></div>
+        <button class="btn btn-action btn-warn"    id="btn-request-changes" disabled>🔄 Request Changes</button>
+        <button class="btn btn-action btn-success" id="btn-approve"         disabled>✅ Approve PR</button>
+        <button class="btn btn-action btn-danger"  id="btn-submit"          disabled>📤 Submit Review</button>
+        <button class="btn btn-action btn-escalate" id="btn-escalate"       disabled>🚨 Escalate to Security</button>
+      </div>
+    </section>
+    <!-- ── RIGHT: Reward dashboard + history ─────────────────── -->
+    <section class="panel panel-reward">
+      <div class="panel-header">Reward Dashboard</div>
+      <!-- Cumulative reward ring -->
+      <div class="reward-ring-wrap">
+        <svg class="reward-ring" viewBox="0 0 120 120">
+          <circle class="ring-bg"    cx="60" cy="60" r="50" />
+          <circle class="ring-track" cx="60" cy="60" r="50" id="ring-track" />
+        </svg>
+        <div class="ring-label">
+          <span id="ring-value">0.00</span>
+          <small>cumulative</small>
+        </div>
+      </div>
+      <!-- Per-step component bars -->
+      <div class="component-bars" id="component-bars">
+        <div class="section-title">Last Step Breakdown</div>
+        <div class="bar-row" id="bar-row-issue_credit">
+          <span class="bar-label">Issue credit</span>
+          <div class="bar-track"><div class="bar-fill positive" id="bar-issue_credit"></div></div>
+          <span class="bar-val" id="val-issue_credit">0.00</span>
+        </div>
+        <div class="bar-row" id="bar-row-classification_credit">
+          <span class="bar-label">Classification</span>
+          <div class="bar-track"><div class="bar-fill positive" id="bar-classification_credit"></div></div>
+          <span class="bar-val" id="val-classification_credit">0.00</span>
+        </div>
+        <div class="bar-row" id="bar-row-false_positive_penalty">
+          <span class="bar-label">FP penalty</span>
+          <div class="bar-track"><div class="bar-fill negative" id="bar-false_positive_penalty"></div></div>
+          <span class="bar-val" id="val-false_positive_penalty">0.00</span>
+        </div>
+        <div class="bar-row" id="bar-row-coverage_bonus">
+          <span class="bar-label">Coverage</span>
+          <div class="bar-track"><div class="bar-fill positive" id="bar-coverage_bonus"></div></div>
+          <span class="bar-val" id="val-coverage_bonus">0.00</span>
+        </div>
+        <div class="bar-row" id="bar-row-decision_score">
+          <span class="bar-label">Decision</span>
+          <div class="bar-track"><div class="bar-fill neutral" id="bar-decision_score"></div></div>
+          <span class="bar-val" id="val-decision_score">0.00</span>
+        </div>
+        <div class="bar-row" id="bar-row-efficiency_bonus">
+          <span class="bar-label">Efficiency</span>
+          <div class="bar-track"><div class="bar-fill positive" id="bar-efficiency_bonus"></div></div>
+          <span class="bar-val" id="val-efficiency_bonus">0.00</span>
+        </div>
+      </div>
+      <!-- Issues progress -->
+      <div class="section-title" style="margin-top:1rem">Issues Found</div>
+      <div class="issues-progress">
+        <div class="issues-bar-wrap">
+          <div class="issues-bar-fill" id="issues-bar-fill"></div>
+        </div>
+        <span id="issues-found-label">0 / 0</span>
+      </div>
+      <!-- Step-by-step history feed -->
+      <div class="section-title" style="margin-top:1rem">Episode History</div>
+      <div class="history-feed" id="history-feed">
+        <div class="history-empty">No actions yet.</div>
+      </div>
+    </section>
+  </main>
+  <!-- ══════════════════════════════════════════════════════════
+       EPISODE-END MODAL
+  ══════════════════════════════════════════════════════════ -->
+  <div id="modal-overlay" class="modal-overlay" style="display:none">
+    <div class="modal">
+      <div class="modal-icon" id="modal-icon">🏁</div>
+      <h2 id="modal-title">Episode Complete</h2>
+      <p  id="modal-body">—</p>
+      <div class="modal-stats" id="modal-stats"></div>
+      <button class="btn btn-primary" id="modal-close">Start New Episode</button>
+    </div>
+  </div>
+  <script src="app.js"></script>
+</body>
+</html>

frontend/style.css ADDED Viewed

	@@ -0,0 +1,391 @@

+/* ═══════════════════════════════════════════════════════════════
+   PRobe Dashboard — stylesheet
+   Design tokens: dark IDE theme, accent #4f9eff
+═══════════════════════════════════════════════════════════════ */
+/* ── Reset & base ─────────────────────────────────────────── */
+*, *::before, *::after { box-sizing: border-box; margin: 0; padding: 0; }
+:root {
+  --bg-0:       #0d1117;   /* deepest background                */
+  --bg-1:       #161b22;   /* panel background                  */
+  --bg-2:       #21262d;   /* card / input background           */
+  --bg-3:       #30363d;   /* hover / border                    */
+  --text-main:  #e6edf3;
+  --text-dim:   #8b949e;
+  --accent:     #4f9eff;
+  --green:      #3fb950;
+  --red:        #f85149;
+  --yellow:     #d29922;
+  --orange:     #db6d28;
+  --purple:     #a371f7;
+  --radius:     8px;
+  --font-mono:  'JetBrains Mono', 'Fira Code', 'Consolas', monospace;
+  --font-ui:    'Inter', system-ui, sans-serif;
+  --topbar-h:   52px;
+}
+html, body {
+  height: 100%;
+  background: var(--bg-0);
+  color: var(--text-main);
+  font-family: var(--font-ui);
+  font-size: 14px;
+  line-height: 1.5;
+}
+/* ── Top bar ──────────────────────────────────────────────── */
+.topbar {
+  position: fixed;
+  top: 0; left: 0; right: 0;
+  height: var(--topbar-h);
+  background: var(--bg-1);
+  border-bottom: 1px solid var(--bg-3);
+  display: flex;
+  align-items: center;
+  justify-content: space-between;
+  padding: 0 1.25rem;
+  z-index: 100;
+}
+.topbar-left { display: flex; align-items: center; gap: 1rem; }
+.logo { font-size: 1.15rem; font-weight: 700; color: var(--accent); }
+.tagline { color: var(--text-dim); font-size: 0.8rem; }
+.topbar-right { display: flex; align-items: center; gap: 0.75rem; }
+.badge {
+  font-size: 0.78rem;
+  padding: 3px 10px;
+  border-radius: 12px;
+  background: var(--bg-2);
+  border: 1px solid var(--bg-3);
+  white-space: nowrap;
+}
+.badge.connected    { color: var(--green);  border-color: var(--green); }
+.badge.disconnected { color: var(--text-dim); }
+/* ── Buttons ──────────────────────────────────────────────── */
+.btn {
+  padding: 6px 16px;
+  border-radius: var(--radius);
+  border: 1px solid transparent;
+  font-size: 0.82rem;
+  font-weight: 600;
+  cursor: pointer;
+  transition: opacity 0.15s, background 0.15s;
+}
+.btn:disabled { opacity: 0.35; cursor: not-allowed; }
+.btn-primary  { background: var(--accent); color: #fff; border-color: var(--accent); }
+.btn-secondary{ background: var(--bg-2);   color: var(--text-main); border-color: var(--bg-3); }
+.btn-action   { width: 100%; margin-bottom: 0.4rem; background: var(--bg-2); color: var(--text-main); border-color: var(--bg-3); }
+.btn-info     { border-color: var(--accent);  color: var(--accent); }
+.btn-warn     { border-color: var(--yellow);  color: var(--yellow); }
+.btn-success  { border-color: var(--green);   color: var(--green);  }
+.btn-danger   { border-color: var(--red);     color: var(--red);    background: rgba(248,81,73,0.1); }
+.btn-escalate { border-color: var(--purple);  color: var(--purple); background: rgba(163,113,247,0.1); }
+.btn:not(:disabled):hover { opacity: 0.82; }
+/* ── Main three-column layout ─────────────────────────────── */
+.layout {
+  display: grid;
+  grid-template-columns: 1fr 310px 310px;
+  grid-template-rows: calc(100vh - var(--topbar-h));
+  gap: 0;
+  margin-top: var(--topbar-h);
+  overflow: hidden;
+}
+/* ── Generic panel ────────────────────────────────────────── */
+.panel {
+  background: var(--bg-1);
+  border-right: 1px solid var(--bg-3);
+  overflow-y: auto;
+  padding: 1rem;
+  display: flex;
+  flex-direction: column;
+  gap: 0.75rem;
+}
+.panel:last-child { border-right: none; }
+.panel-header {
+  font-weight: 700;
+  font-size: 0.85rem;
+  color: var(--text-dim);
+  text-transform: uppercase;
+  letter-spacing: 0.06em;
+  display: flex;
+  align-items: center;
+  gap: 0.75rem;
+  flex-wrap: wrap;
+}
+.section-title {
+  font-size: 0.78rem;
+  font-weight: 600;
+  color: var(--text-dim);
+  text-transform: uppercase;
+  letter-spacing: 0.05em;
+}
+/* ── Task metadata ────────────────────────────────────────── */
+#task-label { color: var(--accent); font-size: 0.9rem; }
+.difficulty-badge {
+  font-size: 0.72rem;
+  padding: 2px 8px;
+  border-radius: 10px;
+  background: var(--bg-2);
+  border: 1px solid var(--bg-3);
+  text-transform: capitalize;
+}
+.difficulty-badge.ultra-easy { color: var(--green);  border-color: var(--green);  }
+.difficulty-badge.easy       { color: var(--accent); border-color: var(--accent); }
+.difficulty-badge.medium     { color: var(--yellow); border-color: var(--yellow); }
+.difficulty-badge.hard       { color: var(--orange); border-color: var(--orange); }
+.difficulty-badge.adversarial{ color: var(--red);    border-color: var(--red);    }
+.steps-counter { margin-left: auto; font-size: 0.8rem; color: var(--text-dim); }
+.task-desc {
+  font-size: 0.82rem;
+  color: var(--text-dim);
+  line-height: 1.6;
+  background: var(--bg-2);
+  border: 1px solid var(--bg-3);
+  border-radius: var(--radius);
+  padding: 0.6rem 0.8rem;
+}
+.adversarial-hint {
+  font-size: 0.8rem;
+  background: rgba(163,113,247,0.1);
+  border: 1px solid var(--purple);
+  border-radius: var(--radius);
+  padding: 0.5rem 0.75rem;
+  color: var(--purple);
+}
+/* ── Code viewer ──────────────────────────────────────────── */
+.code-wrapper {
+  flex: 1;
+  overflow: auto;
+  border: 1px solid var(--bg-3);
+  border-radius: var(--radius);
+  background: var(--bg-0);
+}
+.code-block {
+  font-family: var(--font-mono);
+  font-size: 0.78rem;
+  line-height: 1.65;
+  padding: 0.75rem 1rem;
+  white-space: pre;
+  counter-reset: line-counter;
+}
+.code-line { display: block; }
+.code-line-num {
+  user-select: none;
+  display: inline-block;
+  width: 2.8em;
+  color: var(--text-dim);
+  text-align: right;
+  margin-right: 1em;
+  font-size: 0.72rem;
+}
+/* Highlighted lines (comment target or scanner finding) */
+.code-line.hl-comment  { background: rgba(79,158,255,0.12); border-left: 3px solid var(--accent); }
+.code-line.hl-issue    { background: rgba(248,81,73,0.10);  border-left: 3px solid var(--red);    }
+.code-line.hl-scanner  { background: rgba(210,153,34,0.10); border-left: 3px solid var(--yellow); }
+.code-line.hl-context  { background: rgba(63,185,80,0.08);  border-left: 3px solid var(--green);  }
+.placeholder-text { color: var(--text-dim); font-style: italic; }
+/* ── Hints ────────────────────────────────────────────────── */
+.hints-list {
+  display: flex;
+  flex-direction: column;
+  gap: 0.4rem;
+}
+.hint-item {
+  font-size: 0.8rem;
+  background: rgba(63,185,80,0.08);
+  border: 1px solid var(--green);
+  border-radius: var(--radius);
+  padding: 0.5rem 0.75rem;
+  color: var(--text-main);
+  white-space: pre-wrap;
+}
+/* ── Action cards ─────────────────────────────────────────── */
+.action-card {
+  background: var(--bg-2);
+  border: 1px solid var(--bg-3);
+  border-radius: var(--radius);
+  padding: 0.8rem;
+  display: flex;
+  flex-direction: column;
+  gap: 0.5rem;
+}
+.action-title {
+  font-size: 0.8rem;
+  font-weight: 700;
+  color: var(--text-dim);
+  text-transform: uppercase;
+  letter-spacing: 0.05em;
+  margin-bottom: 0.25rem;
+}
+.form-row {
+  display: flex;
+  flex-direction: column;
+  gap: 3px;
+}
+.form-row label { font-size: 0.75rem; color: var(--text-dim); }
+.form-row input,
+.form-row select,
+.form-row textarea {
+  background: var(--bg-0);
+  border: 1px solid var(--bg-3);
+  border-radius: 5px;
+  color: var(--text-main);
+  font-family: var(--font-ui);
+  font-size: 0.82rem;
+  padding: 5px 8px;
+  resize: vertical;
+}
+.form-row input:focus,
+.form-row select:focus,
+.form-row textarea:focus {
+  outline: none;
+  border-color: var(--accent);
+}
+.quick-actions {
+  background: var(--bg-2);
+  border: 1px solid var(--bg-3);
+  border-radius: var(--radius);
+  padding: 0.8rem;
+  display: flex;
+  flex-direction: column;
+  gap: 0.4rem;
+}
+.separator { height: 1px; background: var(--bg-3); margin: 0.3rem 0; }
+/* ── Reward ring ──────────────────────────────────────────── */
+.reward-ring-wrap {
+  position: relative;
+  width: 120px;
+  margin: 0 auto;
+}
+.reward-ring { width: 120px; height: 120px; transform: rotate(-90deg); }
+.ring-bg    { fill: none; stroke: var(--bg-2); stroke-width: 10; }
+.ring-track {
+  fill: none;
+  stroke: var(--accent);
+  stroke-width: 10;
+  stroke-linecap: round;
+  stroke-dasharray: 314;   /* 2π × r=50 */
+  stroke-dashoffset: 314;
+  transition: stroke-dashoffset 0.5s ease, stroke 0.5s ease;
+}
+.ring-label {
+  position: absolute;
+  inset: 0;
+  display: flex;
+  flex-direction: column;
+  align-items: center;
+  justify-content: center;
+  font-weight: 700;
+  font-size: 1.1rem;
+}
+.ring-label small { font-size: 0.65rem; color: var(--text-dim); font-weight: 400; }
+/* ── Component bar chart ────────────────────��─────────────── */
+.component-bars { display: flex; flex-direction: column; gap: 6px; }
+.bar-row { display: flex; align-items: center; gap: 6px; }
+.bar-label { font-size: 0.72rem; color: var(--text-dim); width: 90px; flex-shrink: 0; }
+.bar-track { flex: 1; height: 7px; background: var(--bg-2); border-radius: 4px; overflow: hidden; }
+.bar-fill  { height: 100%; border-radius: 4px; width: 0; transition: width 0.4s ease; }
+.bar-fill.positive { background: var(--green);  }
+.bar-fill.negative { background: var(--red);    }
+.bar-fill.neutral  { background: var(--yellow); }
+.bar-val { font-size: 0.72rem; width: 36px; text-align: right; color: var(--text-dim); }
+/* ── Issues progress ──────────────────────────────────────── */
+.issues-progress { display: flex; align-items: center; gap: 8px; }
+.issues-bar-wrap {
+  flex: 1; height: 8px;
+  background: var(--bg-2);
+  border-radius: 4px;
+  overflow: hidden;
+}
+.issues-bar-fill {
+  height: 100%;
+  background: var(--accent);
+  border-radius: 4px;
+  width: 0;
+  transition: width 0.4s ease;
+}
+/* ── History feed ─────────────────────────────────────────── */
+.history-feed {
+  display: flex;
+  flex-direction: column;
+  gap: 0.4rem;
+  max-height: 320px;
+  overflow-y: auto;
+}
+.history-empty { color: var(--text-dim); font-size: 0.8rem; font-style: italic; }
+.history-item {
+  background: var(--bg-2);
+  border: 1px solid var(--bg-3);
+  border-radius: 6px;
+  padding: 0.45rem 0.65rem;
+  font-size: 0.78rem;
+  border-left: 3px solid var(--bg-3);
+}
+.history-item.positive { border-left-color: var(--green);  }
+.history-item.negative { border-left-color: var(--red);    }
+.history-item.neutral  { border-left-color: var(--yellow); }
+.history-item .h-action { font-weight: 700; color: var(--accent); }
+.history-item .h-reward { font-weight: 700; }
+.history-item .h-reward.pos { color: var(--green); }
+.history-item .h-reward.neg { color: var(--red);   }
+.history-item .h-explain { color: var(--text-dim); margin-top: 2px; line-height: 1.4; }
+/* ── Episode-end modal ────────────────────────────────────── */
+.modal-overlay {
+  position: fixed; inset: 0;
+  background: rgba(0,0,0,0.7);
+  display: flex; align-items: center; justify-content: center;
+  z-index: 200;
+}
+.modal {
+  background: var(--bg-1);
+  border: 1px solid var(--bg-3);
+  border-radius: 12px;
+  padding: 2rem;
+  max-width: 440px;
+  width: 90%;
+  text-align: center;
+  display: flex;
+  flex-direction: column;
+  align-items: center;
+  gap: 0.75rem;
+}
+.modal-icon { font-size: 3rem; }
+.modal h2   { font-size: 1.3rem; }
+.modal p    { color: var(--text-dim); font-size: 0.88rem; line-height: 1.6; }
+.modal-stats {
+  width: 100%;
+  background: var(--bg-2);
+  border-radius: var(--radius);
+  padding: 0.75rem 1rem;
+  display: grid;
+  grid-template-columns: 1fr 1fr;
+  gap: 0.4rem 1rem;
+  text-align: left;
+  font-size: 0.82rem;
+}
+.modal-stats .stat-label { color: var(--text-dim); }
+.modal-stats .stat-value { font-weight: 700; }
+/* ── Scrollbar styling ────────────────────────────────────── */
+::-webkit-scrollbar { width: 6px; height: 6px; }
+::-webkit-scrollbar-track { background: var(--bg-1); }
+::-webkit-scrollbar-thumb { background: var(--bg-3); border-radius: 3px; }

outputs/baseline_comparison.svg ADDED Viewed

outputs/reward_breakdown.svg ADDED Viewed

run.py ADDED Viewed

	@@ -0,0 +1,65 @@

+"""
+PRobe — unified launcher.
+Starts the FastAPI server which serves:
+  - The interactive frontend at  http://localhost:8000/ui/
+  - The REST API at              http://localhost:8000/docs
+  - The WebSocket at             ws://localhost:8000/ws
+Usage
+-----
+  uv run python run.py              # default: host=0.0.0.0, port=8000
+  uv run python run.py --port 9000
+  uv run python run.py --host 127.0.0.1 --port 8000
+"""
+from __future__ import annotations
+import argparse
+import pathlib
+import sys
+# ── Path bootstrap ────────────────────────────────────────────────────────────
+# Add the project root to sys.path so both `agent` and `environment` packages
+# are importable regardless of how or from where this script is invoked.
+PROJECT_ROOT = pathlib.Path(__file__).parent.resolve()
+sys.path.insert(0, str(PROJECT_ROOT))
+# ── Now safe to import the app ────────────────────────────────────────────────
+from environment.app import app  # noqa: E402  (import after path setup)
+import uvicorn                   # noqa: E402
+def main() -> None:
+    parser = argparse.ArgumentParser(
+        description="Start the PRobe environment server + frontend",
+        formatter_class=argparse.ArgumentDefaultsHelpFormatter,
+    )
+    parser.add_argument("--host", default="0.0.0.0", help="Bind host")
+    parser.add_argument("--port", type=int, default=8000, help="Bind port")
+    parser.add_argument("--reload", action="store_true",
+                        help="Enable auto-reload on code changes (dev mode)")
+    args = parser.parse_args()
+    frontend_url = f"http://{'localhost' if args.host == '0.0.0.0' else args.host}:{args.port}/ui/"
+    api_url      = f"http://{'localhost' if args.host == '0.0.0.0' else args.host}:{args.port}/docs"
+    print("\n" + "=" * 58)
+    print("  PRobe — AI Code Review Training Environment")
+    print("=" * 58)
+    print(f"  Frontend   →  {frontend_url}")
+    print(f"  API docs   →  {api_url}")
+    print(f"  WebSocket  →  ws://localhost:{args.port}/ws")
+    print("=" * 58 + "\n")
+    uvicorn.run(
+        "environment.app:app",
+        host=args.host,
+        port=args.port,
+        reload=args.reload,
+        # Keep uvicorn's own logging minimal so our banner stays visible
+        log_level="warning",
+    )
+if __name__ == "__main__":
+    main()