Spaces:
Runtime error
Runtime error
Thakur, Mahipal commited on
Commit Β·
44bd7bd
1
Parent(s): 4ec7361
UI Integration
Browse files- README.md +403 -4
- docs/design.md +1 -1
- environment/app.py +36 -12
- frontend/app.js +597 -0
- frontend/index.html +212 -0
- frontend/style.css +391 -0
- outputs/baseline_comparison.svg +98 -0
- outputs/reward_breakdown.svg +95 -0
- run.py +65 -0
README.md
CHANGED
|
@@ -6,7 +6,7 @@ colorTo: green
|
|
| 6 |
sdk: docker
|
| 7 |
pinned: false
|
| 8 |
app_port: 8000
|
| 9 |
-
base_path: /
|
| 10 |
tags:
|
| 11 |
- openenv
|
| 12 |
- code-review
|
|
@@ -197,11 +197,23 @@ Find missing rate-limit β nginx config shown β confirms /auth fully e
|
|
| 197 |
## Quickstart
|
| 198 |
|
| 199 |
```bash
|
| 200 |
-
# Install
|
| 201 |
uv sync
|
| 202 |
|
| 203 |
-
#
|
| 204 |
-
uv run
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 205 |
|
| 206 |
# Run zero-shot GPT-4o-mini baseline (requires OPENAI_API_KEY)
|
| 207 |
export OPENAI_API_KEY=sk-...
|
|
@@ -213,6 +225,143 @@ uv run python training/train_grpo.py --test
|
|
| 213 |
|
| 214 |
---
|
| 215 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 216 |
## Training
|
| 217 |
|
| 218 |
| | |
|
|
@@ -283,6 +432,256 @@ Security code review is a high-stakes task performed by a small number of specia
|
|
| 283 |
|
| 284 |
## Repo Structure
|
| 285 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 286 |
```
|
| 287 |
.
|
| 288 |
βββ agent/
|
|
|
|
| 6 |
sdk: docker
|
| 7 |
pinned: false
|
| 8 |
app_port: 8000
|
| 9 |
+
base_path: /ui/
|
| 10 |
tags:
|
| 11 |
- openenv
|
| 12 |
- code-review
|
|
|
|
| 197 |
## Quickstart
|
| 198 |
|
| 199 |
```bash
|
| 200 |
+
# 1. Install all dependencies
|
| 201 |
uv sync
|
| 202 |
|
| 203 |
+
# 2. Start the server + frontend in one command
|
| 204 |
+
uv run python run.py
|
| 205 |
+
|
| 206 |
+
# The terminal will print:
|
| 207 |
+
# ==========================================================
|
| 208 |
+
# PRobe β AI Code Review Training Environment
|
| 209 |
+
# ==========================================================
|
| 210 |
+
# Frontend β http://localhost:8000/ui/
|
| 211 |
+
# API docs β http://localhost:8000/docs
|
| 212 |
+
# WebSocket β ws://localhost:8000/ws
|
| 213 |
+
# ==========================================================
|
| 214 |
+
|
| 215 |
+
# 3. Open your browser
|
| 216 |
+
open http://localhost:8000/ui/
|
| 217 |
|
| 218 |
# Run zero-shot GPT-4o-mini baseline (requires OPENAI_API_KEY)
|
| 219 |
export OPENAI_API_KEY=sk-...
|
|
|
|
| 225 |
|
| 226 |
---
|
| 227 |
|
| 228 |
+
## Interactive Frontend Dashboard
|
| 229 |
+
|
| 230 |
+
PRobe ships with a **zero-dependency browser UI** that turns the RL environment into a live, interactive demo.
|
| 231 |
+
No npm, no build step β just start the server and open your browser.
|
| 232 |
+
|
| 233 |
+
### What It Looks Like
|
| 234 |
+
|
| 235 |
+
```
|
| 236 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 237 |
+
β π PRobe Adversarial Code Review β RL Training Environment β
|
| 238 |
+
β π’ Connected [New Ep] β
|
| 239 |
+
ββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββ¬ββββββββββββββββββββββββββ€
|
| 240 |
+
β Task 2 β auth.py β Actions β Reward Dashboard β
|
| 241 |
+
β medium β’ Step 3 / 20 β β β
|
| 242 |
+
β β π¬ Add Comment β β― +0.24 β
|
| 243 |
+
β β οΈ External contributor, β ββββββββββββββββ β cumulative β
|
| 244 |
+
β no prior commit history β β Line: [12] β β β
|
| 245 |
+
β β β Comment: β β Issue credit ββββββ β
|
| 246 |
+
β Review this auth module. β β SQL inject.. β β Classification βββββ β
|
| 247 |
+
β Identify bugs and decide β β Severity: β β FP penalty βββββ β
|
| 248 |
+
β whether to escalate or β β [critical βΎ] β β Coverage βββββ β
|
| 249 |
+
β request changes. β β Category: β β Decision βββββ β
|
| 250 |
+
β β β [security βΎ] β β Efficiency βββββ β
|
| 251 |
+
β ββ auth.py βββββββββββββββ β ββββββββββββββββ β β
|
| 252 |
+
β β 1: import hashlib β β [Submit Comment] β Issues Found β
|
| 253 |
+
β β 2: β β β ββββββββββ 2 / 5 β
|
| 254 |
+
β β 3: DB_PASS = "s3cr" β β β‘ Quick Actions β β
|
| 255 |
+
β β 12: cursor.execute( ββββ€ [π Get Context] β Episode History β
|
| 256 |
+
β β f"SELECT * FROM β β [π€ Run Scanner] β βββββββββββββββββββββ β
|
| 257 |
+
β β users WHERE β β βββββββββββββββ β β ADD_COMMENT +0.12 β β
|
| 258 |
+
β β 13: username='{u}'" β β [π Req Changes] β β sql injection L12 β β
|
| 259 |
+
β β 14: ) β β [β
Approve PR] β βββββββββββββββββββββ€ β
|
| 260 |
+
β ββββββββββββββββββββββββββ β [π€ Submit] β β RUN_SCANNER +0.00 β β
|
| 261 |
+
β β [π¨ Escalate] β β 3 findings found β β
|
| 262 |
+
ββββββββββββββββββββββββββββββββ΄ββββββββββββββββββββ΄ββββββββββββββββββββββββββ
|
| 263 |
+
```
|
| 264 |
+
|
| 265 |
+
### Three-Column Layout
|
| 266 |
+
|
| 267 |
+
**Left β Code Viewer**
|
| 268 |
+
- Full source code with **line numbers** for every episode
|
| 269 |
+
- Lines are **colour-coded** as you act:
|
| 270 |
+
- π΅ Blue β line you just commented on
|
| 271 |
+
- π‘ Yellow β line flagged by the scanner
|
| 272 |
+
- π’ Green β line you probed with Get Context
|
| 273 |
+
- **Unlocked hints** appear below the code as green panels whenever a key issue is found
|
| 274 |
+
- The **adversarial hint** banner tells you whether the PR is from a trusted team member or an unknown external contributor
|
| 275 |
+
|
| 276 |
+
**Centre β Action Panel**
|
| 277 |
+
- **Add Comment** form: line number, free-text comment, severity, category, and bug/backdoor classification
|
| 278 |
+
- **Quick Actions**: single-click buttons for all 7 action types
|
| 279 |
+
|
| 280 |
+
| Button | Action | What Happens |
|
| 281 |
+
|---|---|---|
|
| 282 |
+
| π Get Context | `get_context` | Reveals Β±5 lines around the probed line number |
|
| 283 |
+
| π€ Run Scanner | `run_scanner` | Runs the simulated static-analysis tool |
|
| 284 |
+
| π Request Changes | `request_changes` | Records your review decision |
|
| 285 |
+
| β
Approve PR | `approve` | Approves (β0.15 penalty if < 50 % issues found) |
|
| 286 |
+
| π€ Submit Review | `submit_review` | Ends the episode; triggers terminal scoring |
|
| 287 |
+
| π¨ Escalate to Security | `escalate_to_security_review` | Correct only on adversarial tasks 7β9 |
|
| 288 |
+
|
| 289 |
+
**Right β Reward Dashboard**
|
| 290 |
+
- **Animated ring** showing cumulative episode reward (green above zero, red below)
|
| 291 |
+
- **Six component bars** updating in real time after every action:
|
| 292 |
+
- Issue credit, Classification credit, FP penalty
|
| 293 |
+
- Coverage bonus, Decision score, Efficiency bonus
|
| 294 |
+
- **Issues progress bar** showing how many ground-truth issues you have found
|
| 295 |
+
- **Episode history feed** β every action with its reward delta and explanation
|
| 296 |
+
|
| 297 |
+
### Episode End Modal
|
| 298 |
+
|
| 299 |
+
When the episode terminates (via Submit Review or Escalate), a modal pops up showing:
|
| 300 |
+
|
| 301 |
+
```
|
| 302 |
+
π Episode Passed!
|
| 303 |
+
|
| 304 |
+
"Found 5/5 issues (weighted coverage 100%).
|
| 305 |
+
Decision 'escalate_to_security_review' was correct."
|
| 306 |
+
|
| 307 |
+
βββββββββββββββββββββββββββββββββββββ
|
| 308 |
+
β Cumulative reward +0.874 β
|
| 309 |
+
β Issues found 5 / 5 β
|
| 310 |
+
β Steps used 18 / 25 β
|
| 311 |
+
β Decision escalate β
|
| 312 |
+
β Escalation required Yes β
|
| 313 |
+
βββββββββββββββββββββββββββββββββββββ
|
| 314 |
+
|
| 315 |
+
[Start New Episode]
|
| 316 |
+
```
|
| 317 |
+
|
| 318 |
+
Clicking **Start New Episode** automatically loads the next task in the difficulty ladder.
|
| 319 |
+
|
| 320 |
+
### How to Run
|
| 321 |
+
|
| 322 |
+
```bash
|
| 323 |
+
# Install dependencies (one-time)
|
| 324 |
+
uv sync
|
| 325 |
+
|
| 326 |
+
# Start the server β this also serves the frontend
|
| 327 |
+
uv run python run.py
|
| 328 |
+
```
|
| 329 |
+
|
| 330 |
+
Then open **`http://localhost:8000/ui/`** in any browser. No additional setup, no separate frontend server.
|
| 331 |
+
|
| 332 |
+
**Optional flags:**
|
| 333 |
+
|
| 334 |
+
```bash
|
| 335 |
+
# Different port
|
| 336 |
+
uv run python run.py --port 9000
|
| 337 |
+
|
| 338 |
+
# Bind to localhost only (do not expose on the network)
|
| 339 |
+
uv run python run.py --host 127.0.0.1
|
| 340 |
+
|
| 341 |
+
# Dev mode: auto-reload Python files on save
|
| 342 |
+
uv run python run.py --reload
|
| 343 |
+
```
|
| 344 |
+
|
| 345 |
+
### How the Frontend Connects
|
| 346 |
+
|
| 347 |
+
The browser communicates with the backend over a **persistent WebSocket** at `ws://localhost:8000/ws`.
|
| 348 |
+
Each browser tab gets its own isolated environment instance β concurrent sessions do not share state.
|
| 349 |
+
The WebSocket URL is auto-detected from `window.location.hostname` so the UI works on any host or port without editing any file.
|
| 350 |
+
|
| 351 |
+
### Why a Frontend Helps the Story
|
| 352 |
+
|
| 353 |
+
| Without Frontend | With Frontend |
|
| 354 |
+
|---|---|
|
| 355 |
+
| `total=0.345` in a log file | Animated reward ring filling green in real time |
|
| 356 |
+
| `issues_found: ['sql_injection']` | Line 12 highlighted blue in the code viewer |
|
| 357 |
+
| `decision: escalate_to_security_review` | π¨ Escalate button, modal with final score and stats |
|
| 358 |
+
| Understanding the anti-exploit rule | Watching a keyword-spam comment score β0.05 FP penalty |
|
| 359 |
+
| Explaining the causal chain mechanic | Green hint panel appearing after finding the JWT issue |
|
| 360 |
+
|
| 361 |
+
The dashboard makes the reward signal **tangible** β a visitor can play one episode in two minutes and immediately understand what makes PRobe different from a linter.
|
| 362 |
+
|
| 363 |
+
---
|
| 364 |
+
|
| 365 |
## Training
|
| 366 |
|
| 367 |
| | |
|
|
|
|
| 432 |
|
| 433 |
## Repo Structure
|
| 434 |
|
| 435 |
+
```
|
| 436 |
+
.
|
| 437 |
+
βββ agent/
|
| 438 |
+
β βββ client.py # HTTP client for interacting with the environment server
|
| 439 |
+
β βββ models.py # Pydantic models: ProbeAction, ProbeObservation, RewardType
|
| 440 |
+
β βββ __init__.py
|
| 441 |
+
βββ environment/
|
| 442 |
+
β βββ app.py # FastAPI server (HTTP + WebSocket + static frontend at /ui/)
|
| 443 |
+
β βββ Dockerfile # Container definition for HuggingFace Spaces
|
| 444 |
+
β βββ episode_memory.py # Cross-episode JSON memory (injects prior-finding hints)
|
| 445 |
+
β βββ graders.py # Deterministic reward grader (keyword+line+length verifier)
|
| 446 |
+
β βββ mutator.py # Code mutation engine (rename / shift / nudge)
|
| 447 |
+
β βββ probe_environment.py # Core environment: reset / step / state / action handlers
|
| 448 |
+
β βββ requirements.txt # Server-side Python dependencies
|
| 449 |
+
β βββ scanner.py # Simulated static-analysis tool (70% recall, FP injection)
|
| 450 |
+
β βββ tasks.py # 10 task definitions with ground-truth issue lists
|
| 451 |
+
β βββ _import_compat.py # Import shim for package / script / test contexts
|
| 452 |
+
β βββ __init__.py
|
| 453 |
+
βββ frontend/
|
| 454 |
+
β βββ index.html # Three-column dashboard layout
|
| 455 |
+
β βββ style.css # Dark IDE theme (no build step required)
|
| 456 |
+
β βββ app.js # WebSocket client, code viewer, reward ring, history feed
|
| 457 |
+
βββ training/
|
| 458 |
+
β βββ baseline.py # Zero-shot GPT-4o-mini baseline agent + plotting
|
| 459 |
+
β βββ scripted_baseline.py # Deterministic oracle and spammer stress-tests
|
| 460 |
+
β βββ train_grpo.py # GRPO training script (TRL + optional Unsloth, 5-phase curriculum)
|
| 461 |
+
β βββ __init__.py
|
| 462 |
+
βββ tests/
|
| 463 |
+
β βββ test_dynamic_world.py # Tests for mutation engine and scanner noise model
|
| 464 |
+
β βββ test_grader.py # Tests for reward grader correctness
|
| 465 |
+
β βββ __init__.py
|
| 466 |
+
βββ docs/
|
| 467 |
+
β βββ design.md # Architecture notes
|
| 468 |
+
βββ outputs/
|
| 469 |
+
β βββ scripted_baseline.jsonl # Sample baseline results
|
| 470 |
+
βββ run.py # One-command launcher: starts server + serves frontend
|
| 471 |
+
βββ openenv.yaml # OpenEnv manifest (10 tasks, full schema)
|
| 472 |
+
βββ pyproject.toml # Project metadata and dependencies
|
| 473 |
+
βββ pytest.ini # Test configuration
|
| 474 |
+
```
|
| 475 |
+
|
| 476 |
+
---
|
| 477 |
+
|
| 478 |
+
## OpenEnv Compliance Checklist
|
| 479 |
+
|
| 480 |
+
- [x] Built on `Environment` base class (`ProbeEnvironment(Environment)` in `environment/probe_environment.py`)
|
| 481 |
+
- [x] `reset()`, `step()`, `state()` all implemented (async-native via `async_reset` / `async_step` / `async_state`; sync wrappers delegate safely via `asyncio.run`)
|
| 482 |
+
- [x] `step()` returns `tuple[ObservationType, RewardType, bool, dict]` (see `async_step` in `probe_environment.py`)
|
| 483 |
+
- [x] Dedicated `RewardType` Pydantic v2 model with `model_config = ConfigDict(frozen=True)` (`agent/models.py`)
|
| 484 |
+
- [x] Valid `openenv.yaml` manifest (spec_version, name, type, runtime, app, port, 10 tasks, observation schema)
|
| 485 |
+
- [x] Client/server separation enforced (`agent/` = client models + HTTP client; `environment/` = server logic)
|
| 486 |
+
- [x] No reserved MCP tool names used
|
| 487 |
+
- [ ] Hosted on HuggingFace Spaces ([FILL: deploy and add URL to links table above])
|
| 488 |
+
|
| 489 |
+
|
| 490 |
+
---
|
| 491 |
+
|
| 492 |
+
## The Problem
|
| 493 |
+
|
| 494 |
+
The XZ Utils backdoor (CVE-2024-3094) slipped through two years of open-source review. SolarWinds compromised 18,000 organisations via a tampered build pipeline. In both cases the malicious change *looked* like a legitimate contribution β the kind of PR that lands in a code-review queue every day.
|
| 495 |
+
|
| 496 |
+
Today's LLMs scan code like a linter. They find style issues, flag known CVE patterns, and produce plausible-sounding comments. What they don't do is *investigate* β reason about intent, distinguish an honest off-by-one from a planted authentication bypass, or know when to escalate rather than request changes. Reward signals for code generation are everywhere; reward signals for critical code *evaluation* barely exist.
|
| 497 |
+
|
| 498 |
+
PRobe closes that gap. Its fully deterministic grader β keyword + line-range matching, no LLM judge β separates investigation quality from keyword spam. An agent that dumps every security term at random lines scores *negative*. One that reads carefully, probes for context, finds the right lines, and correctly labels each flaw as an honest bug or a deliberate backdoor scores close to `+1.0`.
|
| 499 |
+
|
| 500 |
+
---
|
| 501 |
+
|
| 502 |
+
## What the Agent Sees, Does, and Gets Rewarded For
|
| 503 |
+
|
| 504 |
+
### Plain English
|
| 505 |
+
|
| 506 |
+
The agent is handed a Python source file and asked to review it like a senior security engineer. It can annotate suspicious lines, probe specific regions for more context, run a simulated scanner (which, like real tools, misses things and occasionally lies), and finally submit a verdict. On adversarial tasks it must also decide whether the code contains a deliberate backdoor and escalate to a security team if so. Every episode the code surface changes β variable names, line numbers, constants β so the agent cannot memorise answers; it has to read.
|
| 507 |
+
|
| 508 |
+
### What the Agent Observes (`ProbeObservation`)
|
| 509 |
+
|
| 510 |
+
| Field | Description |
|
| 511 |
+
|---|---|
|
| 512 |
+
| `code_snippet` | Mutated Python source for this episode |
|
| 513 |
+
| `task_description` | Review instructions and goals |
|
| 514 |
+
| `file_name` | Name of the file being reviewed |
|
| 515 |
+
| `task_id` / `task_difficulty` | Current task index (0β9) and difficulty label |
|
| 516 |
+
| `review_history` | All actions taken so far this episode |
|
| 517 |
+
| `step_count` / `max_steps` | Steps used vs. budget |
|
| 518 |
+
| `issues_found_count` / `total_issues` | Progress tracker |
|
| 519 |
+
| `context_hints` | Causal hints unlocked by finding key issues |
|
| 520 |
+
| `reward` | Most recent step reward in `[-1.0, 1.0]` |
|
| 521 |
+
| `done` | Whether the episode has ended |
|
| 522 |
+
|
| 523 |
+
### What Actions the Agent Can Take (`ProbeAction`)
|
| 524 |
+
|
| 525 |
+
| Action | Effect |
|
| 526 |
+
|---|---|
|
| 527 |
+
| `add_comment` | Annotate a line with text, severity, category, and optional backdoor classification |
|
| 528 |
+
| `get_context` | Reveal Β±5 lines of context around a chosen line number |
|
| 529 |
+
| `run_scanner` | Invoke simulated static-analysis tool (70 % recall, up to 2 false positives injected) |
|
| 530 |
+
| `request_changes` | Mark PR as requiring fixes (correct terminal action for tasks 0β6) |
|
| 531 |
+
| `approve` | Approve the PR (penalised if issues remain) |
|
| 532 |
+
| `submit_review` | Finalise the review and end the episode |
|
| 533 |
+
| `escalate_to_security_review` | Flag PR as containing a deliberate attack (required for tasks 7β9) |
|
| 534 |
+
|
| 535 |
+
### Reward Formula
|
| 536 |
+
|
| 537 |
+
Reward accumulates across steps and is finalised at submission:
|
| 538 |
+
|
| 539 |
+
```
|
| 540 |
+
Episode reward =
|
| 541 |
+
|
| 542 |
+
Ξ£ per-comment (ADD_COMMENT):
|
| 543 |
+
issue_credit = (weight_i / total_weight) Γ 0.40 β found a real issue
|
| 544 |
+
classification_credit = (weight_i / total_weight) Γ 0.20 β correct bug/backdoor label
|
| 545 |
+
misclassify_penalty = β0.05 β found it but labelled it wrong
|
| 546 |
+
false_positive_penalty = β0.05 β substantive comment, no issue matched
|
| 547 |
+
|
| 548 |
+
+ on terminal (SUBMIT_REVIEW or ESCALATE):
|
| 549 |
+
coverage_bonus = weighted_coverage Γ 0.15 β proportional to issues found
|
| 550 |
+
decision_score = +0.15 / β0.15 β correct / wrong final action
|
| 551 |
+
(bonus gated: requires coverage β₯ 30 %)
|
| 552 |
+
efficiency_bonus = (1 β steps_used/max_steps) Γ 0.10 β unlocked only if coverage β₯ 60 %
|
| 553 |
+
|
| 554 |
+
Maximum achievable: ~1.0 Minimum: β1.0
|
| 555 |
+
```
|
| 556 |
+
|
| 557 |
+
### Anti-Exploit Verifier
|
| 558 |
+
|
| 559 |
+
A comment earns `issue_credit` only when **all three** conditions hold simultaneously:
|
| 560 |
+
|
| 561 |
+
1. **`keyword_hit`** β at least one issue keyword appears in the comment text
|
| 562 |
+
2. **`line_hit`** β `line_number` is within Β±2 lines of the declared issue range
|
| 563 |
+
3. **`substantive`** β comment body is longer than 15 characters
|
| 564 |
+
|
| 565 |
+
This closes three common reward-hacking paths: keyword spam (fails `line_hit`), wide-net line fishing (fails `keyword_hit`), and one-word dumps (fails `substantive`). The decision bonus additionally requires weighted coverage β₯ 30 % before it can be earned, so an agent that never reads code and always guesses `request_changes` earns zero β not a bonus.
|
| 566 |
+
|
| 567 |
+
### Perfect Episode vs. Failing Episode
|
| 568 |
+
|
| 569 |
+
**Perfect:** The agent reads the code, annotates every real issue at the correct line with a substantive, keyword-bearing comment, correctly labels each as `accidental_bug` or `intentional_backdoor`, escalates when required, and submits with steps to spare. Score approaches `1.0`.
|
| 570 |
+
|
| 571 |
+
**Failing:** The agent spams generic comments on random lines, never co-locates a keyword with a real issue line, triggers false-positive penalties on every step, and submits the wrong terminal action. Score approaches `β1.0`.
|
| 572 |
+
|
| 573 |
+
---
|
| 574 |
+
|
| 575 |
+
## Environment Design
|
| 576 |
+
|
| 577 |
+
### Difficulty Tiers
|
| 578 |
+
|
| 579 |
+
| Tier | Tasks | Max Steps | Issues | What Changes |
|
| 580 |
+
|---|---|---|---|---|
|
| 581 |
+
| **Ultra-Easy** (bootstrap) | 0 | 6 | 2 | Category names spelled out in code comments above each bug; guarantees GRPO positive trajectories from step 1 |
|
| 582 |
+
| **Easy** | 1 | 15 | 3 | Clean logic bugs, no hints, no distractors |
|
| 583 |
+
| **Medium** | 2, 4 | 20 | 5 | Security and async-concurrency issues; wider surface area |
|
| 584 |
+
| **Hard** | 3, 5, 6 | 30β35 | 6β7 | Mixed bug classes; task 6 adds a causal unlock chain |
|
| 585 |
+
| **Adversarial** | 7, 8, 9 | 25β35 | 2β3 | Deliberate backdoors requiring `intentional_backdoor` classification and `escalate_to_security_review` |
|
| 586 |
+
|
| 587 |
+
### Tasks (10 total)
|
| 588 |
+
|
| 589 |
+
| ID | Difficulty | File | Issues | Domain |
|
| 590 |
+
|---|---|---|---|---|
|
| 591 |
+
| 0 | Ultra-easy | `bootstrap.py` | 2 | Off-by-one + hardcoded credential (hinted in comments) |
|
| 592 |
+
| 1 | Easy | `utils.py` | 3 | Logic bugs, off-by-one, dead code |
|
| 593 |
+
| 2 | Medium | `auth.py` | 5 | SQL injection, MD5, eval(), hardcoded creds |
|
| 594 |
+
| 3 | Hard | `data_pipeline.py` | 7 | N+1, SSL bypass, thread leak, OOM cache |
|
| 595 |
+
| 4 | Medium | `async_worker.py` | 5 | Race condition, missing await, resource leak |
|
| 596 |
+
| 5 | Hard | `api_server.py` | 6 | Command injection, path traversal, pickle RCE |
|
| 597 |
+
| 6 | Hard | `auth_service.py` | 6 | **Causal chain** β JWT forgery β privilege escalation |
|
| 598 |
+
| 7 | Adversarial | `session_manager.py` | 2 | Planted auth bypass + hardcoded bypass credential |
|
| 599 |
+
| 8 | Adversarial | `jwt_validator.py` | 2 | **Compound backdoor** β fixed-seed JWT secret + timing oracle |
|
| 600 |
+
| 9 | Adversarial | `compat_shim.py` | 3 | **Supply chain** β import-time env-var exfiltration to attacker domain |
|
| 601 |
+
|
| 602 |
+
### GRPO Curriculum (5 Phases in `train_grpo.py`)
|
| 603 |
+
|
| 604 |
+
| Phase | Steps | Tasks in Pool |
|
| 605 |
+
|---|---|---|
|
| 606 |
+
| 0 | 0 β 40 | 0β1 (ultra-easy / easy) |
|
| 607 |
+
| 1 | 40 β 80 | 0β3 (adds medium / hard) |
|
| 608 |
+
| 2 | 80 β 120 | 0β6 (adds causal chain) |
|
| 609 |
+
| 3 | 120 β 160 | 0β8 (adds adversarial) |
|
| 610 |
+
| 4 | 160 β 200 | 0β9 (full curriculum) |
|
| 611 |
+
|
| 612 |
+
### Reward Components with Weights
|
| 613 |
+
|
| 614 |
+
| Component | Weight | Trigger |
|
| 615 |
+
|---|---|---|
|
| 616 |
+
| `issue_credit` | up to **0.40** cumulative | `add_comment` matches a real issue (keyword + line + length) |
|
| 617 |
+
| `classification_credit` | up to **0.20** cumulative | correct `accidental_bug` / `intentional_backdoor` label |
|
| 618 |
+
| `misclassify_penalty` | **β0.05** per issue | issue found but wrong classification label |
|
| 619 |
+
| `false_positive_penalty` | **β0.05** per comment | substantive comment, zero issues matched |
|
| 620 |
+
| `coverage_bonus` | up to **0.15** terminal | `weighted_coverage Γ 0.15` |
|
| 621 |
+
| `decision_score` | **Β±0.15** terminal | correct / wrong `request_changes` vs `escalate` decision |
|
| 622 |
+
| `efficiency_bonus` | up to **0.10** terminal | `(1 β steps/max_steps) Γ 0.10` when coverage β₯ 60 % |
|
| 623 |
+
| `format_bonus` | **+0.02** once | response contains a valid non-empty JSON array |
|
| 624 |
+
|
| 625 |
+
### Dynamic World (Anti-Memorisation)
|
| 626 |
+
|
| 627 |
+
Each episode `mutate_task()` applies three seed-controlled transforms:
|
| 628 |
+
|
| 629 |
+
| Mutation | Example |
|
| 630 |
+
|---|---|
|
| 631 |
+
| Variable rename | `total` β `acc`, `data` β `payload`, `password` β `passwd` |
|
| 632 |
+
| Line shift | Blank line inserted above first issue; all `line_range` values shift +1 |
|
| 633 |
+
| Constant variance | `range(len(data) + 1)` β `range(len(data) + 2)` |
|
| 634 |
+
|
| 635 |
+
Mutations are deterministic given the episode seed β reproducible runs, always fresh surfaces.
|
| 636 |
+
|
| 637 |
+
### Scanner Noise Model (`scanner.py`)
|
| 638 |
+
|
| 639 |
+
`run_scanner()` simulates a real lint/security tool:
|
| 640 |
+
- **Recall: 70 %** β each real issue is reported with probability 0.70; ~30 % silently missed
|
| 641 |
+
- **False-positive rate: 40 %** β up to 2 injected plausible-but-wrong findings per run
|
| 642 |
+
- Scanner output is **not auto-graded** β the agent must still call `add_comment` with a correct line + keyword to earn reward
|
| 643 |
+
|
| 644 |
+
### Causal Unlock Chain (Task 6)
|
| 645 |
+
|
| 646 |
+
Finding certain issues appends new context hints to the observation, modelling real investigations where one discovery leads to a deeper one:
|
| 647 |
+
|
| 648 |
+
```
|
| 649 |
+
Find hardcoded JWT secret β DB schema revealed β agent can reason: forge token β privilege escalation
|
| 650 |
+
Find missing rate-limit β nginx config shown β confirms /auth fully exposed with no IP filtering
|
| 651 |
+
```
|
| 652 |
+
|
| 653 |
+
### OpenEnv Interface
|
| 654 |
+
|
| 655 |
+
| Method | Returns | Notes |
|
| 656 |
+
|---|---|---|
|
| 657 |
+
| `reset()` | `ProbeObservation` | Starts new episode; advances task cursor; applies mutation |
|
| 658 |
+
| `step(action)` | `(ProbeObservation, RewardType, bool, dict)` | Executes action; returns obs, structured reward, done flag, info dict |
|
| 659 |
+
| `state` (sync property) | `State(episode_id, step_count)` | Lightweight snapshot for `create_app` |
|
| 660 |
+
| `async_state()` | `dict` | Full async snapshot with all episode fields |
|
| 661 |
+
|
| 662 |
+
---
|
| 663 |
+
|
| 664 |
+
## Quickstart
|
| 665 |
+
|
| 666 |
+
```bash
|
| 667 |
+
# Install
|
| 668 |
+
uv sync
|
| 669 |
+
|
| 670 |
+
# Run the environment server
|
| 671 |
+
uv run uvicorn environment.app:app --host 0.0.0.0 --port 8000 --reload
|
| 672 |
+
|
| 673 |
+
# Run zero-shot GPT-4o-mini baseline (requires OPENAI_API_KEY)
|
| 674 |
+
export OPENAI_API_KEY=sk-...
|
| 675 |
+
uv run python training/baseline.py
|
| 676 |
+
|
| 677 |
+
# Smoke-test reward function (no GPU, no API key)
|
| 678 |
+
uv run python training/train_grpo.py --test
|
| 679 |
+
```
|
| 680 |
+
|
| 681 |
+
---
|
| 682 |
+
|
| 683 |
+
## Repo Structure
|
| 684 |
+
|
| 685 |
```
|
| 686 |
.
|
| 687 |
βββ agent/
|
docs/design.md
CHANGED
|
@@ -17,7 +17,7 @@ repo-root/
|
|
| 17 |
|
| 18 |
## Environment entry point
|
| 19 |
|
| 20 |
-
`environment/app.py` β FastAPI app mounted at `/
|
| 21 |
`openenv.yaml` β `app: environment.app:app`.
|
| 22 |
|
| 23 |
## Reward function
|
|
|
|
| 17 |
|
| 18 |
## Environment entry point
|
| 19 |
|
| 20 |
+
`environment/app.py` β FastAPI app mounted at `/ui/` (static frontend) and `/docs` (API).
|
| 21 |
`openenv.yaml` β `app: environment.app:app`.
|
| 22 |
|
| 23 |
## Reward function
|
environment/app.py
CHANGED
|
@@ -21,12 +21,15 @@ from __future__ import annotations
|
|
| 21 |
|
| 22 |
import json
|
| 23 |
import logging
|
|
|
|
| 24 |
from contextlib import asynccontextmanager
|
| 25 |
from typing import Any
|
| 26 |
|
| 27 |
import uvicorn
|
| 28 |
from fastapi import FastAPI, HTTPException, WebSocket, WebSocketDisconnect
|
|
|
|
| 29 |
from fastapi.responses import HTMLResponse
|
|
|
|
| 30 |
|
| 31 |
try:
|
| 32 |
from openenv.core.env_server.http_server import create_app as _create_openenv_app
|
|
@@ -37,7 +40,7 @@ except Exception: # pragma: no cover
|
|
| 37 |
try:
|
| 38 |
from ..agent.models import ProbeAction, ProbeObservation, RewardType
|
| 39 |
from .probe_environment import ProbeEnvironment
|
| 40 |
-
except ModuleNotFoundError:
|
| 41 |
from agent.models import ProbeAction, ProbeObservation, RewardType # type: ignore
|
| 42 |
from environment.probe_environment import ProbeEnvironment # type: ignore
|
| 43 |
|
|
@@ -85,6 +88,11 @@ class StepResponse:
|
|
| 85 |
|
| 86 |
# ββ App factory βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 87 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 88 |
def _build_app() -> FastAPI:
|
| 89 |
application = FastAPI(
|
| 90 |
title="PRobe",
|
|
@@ -93,6 +101,15 @@ def _build_app() -> FastAPI:
|
|
| 93 |
lifespan=lifespan,
|
| 94 |
)
|
| 95 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 96 |
# ββ HTTP endpoints ββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 97 |
|
| 98 |
@application.post("/reset", summary="Start a new episode")
|
|
@@ -175,18 +192,25 @@ def _build_app() -> FastAPI:
|
|
| 175 |
pass
|
| 176 |
|
| 177 |
# ββ Web UI ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 178 |
-
|
| 179 |
@application.get("/web", response_class=HTMLResponse, include_in_schema=False)
|
| 180 |
-
async def
|
| 181 |
-
return
|
| 182 |
-
|
| 183 |
-
|
| 184 |
-
|
| 185 |
-
|
| 186 |
-
|
| 187 |
-
|
| 188 |
-
|
| 189 |
-
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 190 |
|
| 191 |
return application
|
| 192 |
|
|
|
|
| 21 |
|
| 22 |
import json
|
| 23 |
import logging
|
| 24 |
+
import pathlib
|
| 25 |
from contextlib import asynccontextmanager
|
| 26 |
from typing import Any
|
| 27 |
|
| 28 |
import uvicorn
|
| 29 |
from fastapi import FastAPI, HTTPException, WebSocket, WebSocketDisconnect
|
| 30 |
+
from fastapi.middleware.cors import CORSMiddleware
|
| 31 |
from fastapi.responses import HTMLResponse
|
| 32 |
+
from fastapi.staticfiles import StaticFiles
|
| 33 |
|
| 34 |
try:
|
| 35 |
from openenv.core.env_server.http_server import create_app as _create_openenv_app
|
|
|
|
| 40 |
try:
|
| 41 |
from ..agent.models import ProbeAction, ProbeObservation, RewardType
|
| 42 |
from .probe_environment import ProbeEnvironment
|
| 43 |
+
except (ImportError, ModuleNotFoundError):
|
| 44 |
from agent.models import ProbeAction, ProbeObservation, RewardType # type: ignore
|
| 45 |
from environment.probe_environment import ProbeEnvironment # type: ignore
|
| 46 |
|
|
|
|
| 88 |
|
| 89 |
# ββ App factory βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 90 |
|
| 91 |
+
# Resolve the frontend directory relative to this file so the app works
|
| 92 |
+
# regardless of the working directory it is launched from.
|
| 93 |
+
_FRONTEND_DIR = pathlib.Path(__file__).parent.parent / "frontend"
|
| 94 |
+
|
| 95 |
+
|
| 96 |
def _build_app() -> FastAPI:
|
| 97 |
application = FastAPI(
|
| 98 |
title="PRobe",
|
|
|
|
| 101 |
lifespan=lifespan,
|
| 102 |
)
|
| 103 |
|
| 104 |
+
# Allow the frontend (served on the same host, any port) to call the API.
|
| 105 |
+
# In production, restrict allow_origins to the exact frontend URL.
|
| 106 |
+
application.add_middleware(
|
| 107 |
+
CORSMiddleware,
|
| 108 |
+
allow_origins=["*"],
|
| 109 |
+
allow_methods=["*"],
|
| 110 |
+
allow_headers=["*"],
|
| 111 |
+
)
|
| 112 |
+
|
| 113 |
# ββ HTTP endpoints ββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 114 |
|
| 115 |
@application.post("/reset", summary="Start a new episode")
|
|
|
|
| 192 |
pass
|
| 193 |
|
| 194 |
# ββ Web UI ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 195 |
+
# /web β redirect so old links still work
|
| 196 |
@application.get("/web", response_class=HTMLResponse, include_in_schema=False)
|
| 197 |
+
async def web_redirect() -> HTMLResponse:
|
| 198 |
+
return HTMLResponse(
|
| 199 |
+
'<meta http-equiv="refresh" content="0;url=/ui/">',
|
| 200 |
+
status_code=200,
|
| 201 |
+
)
|
| 202 |
+
|
| 203 |
+
# Mount the compiled frontend as a static site at /ui.
|
| 204 |
+
# Falls back gracefully if the frontend directory has not been built yet.
|
| 205 |
+
if _FRONTEND_DIR.is_dir():
|
| 206 |
+
application.mount("/ui", StaticFiles(directory=str(_FRONTEND_DIR), html=True), name="ui")
|
| 207 |
+
log.info("Frontend mounted at /ui from %s", _FRONTEND_DIR)
|
| 208 |
+
else:
|
| 209 |
+
log.warning(
|
| 210 |
+
"Frontend directory not found at %s β /ui will not be available. "
|
| 211 |
+
"Run the frontend build or create the 'frontend/' directory.",
|
| 212 |
+
_FRONTEND_DIR,
|
| 213 |
+
)
|
| 214 |
|
| 215 |
return application
|
| 216 |
|
frontend/app.js
ADDED
|
@@ -0,0 +1,597 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
/**
|
| 2 |
+
* PRobe Frontend β WebSocket client & UI controller
|
| 3 |
+
*
|
| 4 |
+
* Connects to the backend WebSocket at /ws, drives a full episode
|
| 5 |
+
* lifecycle: reset β step* β terminal, and renders all state changes
|
| 6 |
+
* (code viewer, reward bars, history feed, episode-end modal) in real time.
|
| 7 |
+
*
|
| 8 |
+
* Architecture
|
| 9 |
+
* ------------
|
| 10 |
+
* WsClient β thin wrapper around native WebSocket with reconnect
|
| 11 |
+
* RewardDashboard β renders ring, component bars, issues progress
|
| 12 |
+
* CodeViewer β renders syntax-highlighted code with line decorations
|
| 13 |
+
* HistoryFeed β append-only action history list
|
| 14 |
+
* ProbeController β orchestrates all of the above; owns episode state
|
| 15 |
+
*/
|
| 16 |
+
|
| 17 |
+
"use strict";
|
| 18 |
+
|
| 19 |
+
// βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 20 |
+
// CONFIG
|
| 21 |
+
// βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 22 |
+
|
| 23 |
+
const CONFIG = {
|
| 24 |
+
// WebSocket URL β auto-detects host so the page works on any deployment
|
| 25 |
+
wsUrl: `ws://${window.location.hostname}:8000/ws`,
|
| 26 |
+
reconnectDelayMs: 2000,
|
| 27 |
+
ringCircumference: 314, // 2Ο Γ r=50
|
| 28 |
+
};
|
| 29 |
+
|
| 30 |
+
|
| 31 |
+
// βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 32 |
+
// WsClient β WebSocket with auto-reconnect
|
| 33 |
+
// βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 34 |
+
|
| 35 |
+
class WsClient {
|
| 36 |
+
/**
|
| 37 |
+
* @param {string} url WebSocket endpoint
|
| 38 |
+
* @param {function} onMessage Called with parsed JSON message objects
|
| 39 |
+
* @param {function} onStatusChange Called with ('connected'|'disconnected')
|
| 40 |
+
*/
|
| 41 |
+
constructor(url, onMessage, onStatusChange) {
|
| 42 |
+
this._url = url;
|
| 43 |
+
this._onMessage = onMessage;
|
| 44 |
+
this._onStatusChange = onStatusChange;
|
| 45 |
+
this._socket = null;
|
| 46 |
+
this._connected = false;
|
| 47 |
+
}
|
| 48 |
+
|
| 49 |
+
connect() {
|
| 50 |
+
if (this._socket) this._socket.close();
|
| 51 |
+
|
| 52 |
+
this._socket = new WebSocket(this._url);
|
| 53 |
+
|
| 54 |
+
this._socket.onopen = () => {
|
| 55 |
+
this._connected = true;
|
| 56 |
+
this._onStatusChange("connected");
|
| 57 |
+
};
|
| 58 |
+
|
| 59 |
+
this._socket.onclose = () => {
|
| 60 |
+
this._connected = false;
|
| 61 |
+
this._onStatusChange("disconnected");
|
| 62 |
+
};
|
| 63 |
+
|
| 64 |
+
this._socket.onerror = (err) => {
|
| 65 |
+
console.error("[WsClient] error:", err);
|
| 66 |
+
this._connected = false;
|
| 67 |
+
this._onStatusChange("disconnected");
|
| 68 |
+
};
|
| 69 |
+
|
| 70 |
+
this._socket.onmessage = (event) => {
|
| 71 |
+
try {
|
| 72 |
+
const msg = JSON.parse(event.data);
|
| 73 |
+
this._onMessage(msg);
|
| 74 |
+
} catch (e) {
|
| 75 |
+
console.warn("[WsClient] unparseable message:", event.data);
|
| 76 |
+
}
|
| 77 |
+
};
|
| 78 |
+
}
|
| 79 |
+
|
| 80 |
+
send(payload) {
|
| 81 |
+
if (!this._connected) {
|
| 82 |
+
console.warn("[WsClient] send called while disconnected");
|
| 83 |
+
return;
|
| 84 |
+
}
|
| 85 |
+
this._socket.send(JSON.stringify(payload));
|
| 86 |
+
}
|
| 87 |
+
|
| 88 |
+
get isConnected() { return this._connected; }
|
| 89 |
+
}
|
| 90 |
+
|
| 91 |
+
|
| 92 |
+
// βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 93 |
+
// CodeViewer β renders code with per-line decorations
|
| 94 |
+
// βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 95 |
+
|
| 96 |
+
class CodeViewer {
|
| 97 |
+
constructor(preEl) {
|
| 98 |
+
this._pre = preEl;
|
| 99 |
+
this._lines = [];
|
| 100 |
+
// Track which lines have active highlights so we can clear them
|
| 101 |
+
this._decoratedLines = new Set();
|
| 102 |
+
}
|
| 103 |
+
|
| 104 |
+
/**
|
| 105 |
+
* Render source code as numbered, individually addressable lines.
|
| 106 |
+
* Clears any previous decorations.
|
| 107 |
+
*/
|
| 108 |
+
render(sourceCode) {
|
| 109 |
+
this._lines = sourceCode.split("\n");
|
| 110 |
+
this._decoratedLines.clear();
|
| 111 |
+
this._pre.innerHTML = this._lines.map((text, idx) => {
|
| 112 |
+
const lineNum = idx + 1;
|
| 113 |
+
return `<span class="code-line" id="cl-${lineNum}">`
|
| 114 |
+
+ `<span class="code-line-num">${lineNum}</span>`
|
| 115 |
+
+ escapeHtml(text)
|
| 116 |
+
+ `</span>`;
|
| 117 |
+
}).join("\n");
|
| 118 |
+
}
|
| 119 |
+
|
| 120 |
+
/**
|
| 121 |
+
* Apply a CSS class to a specific line.
|
| 122 |
+
* @param {number} lineNumber 1-based
|
| 123 |
+
* @param {string} cssClass e.g. 'hl-comment'
|
| 124 |
+
*/
|
| 125 |
+
decorateLine(lineNumber, cssClass) {
|
| 126 |
+
const el = document.getElementById(`cl-${lineNumber}`);
|
| 127 |
+
if (!el) return;
|
| 128 |
+
// Remove any previous highlight class on this line before adding the new one
|
| 129 |
+
el.classList.remove("hl-comment", "hl-issue", "hl-scanner", "hl-context");
|
| 130 |
+
el.classList.add(cssClass);
|
| 131 |
+
this._decoratedLines.add(lineNumber);
|
| 132 |
+
}
|
| 133 |
+
|
| 134 |
+
/** Scroll the given 1-based line number into view. */
|
| 135 |
+
scrollToLine(lineNumber) {
|
| 136 |
+
const el = document.getElementById(`cl-${lineNumber}`);
|
| 137 |
+
if (el) el.scrollIntoView({ block: "center", behavior: "smooth" });
|
| 138 |
+
}
|
| 139 |
+
|
| 140 |
+
clearDecorations() {
|
| 141 |
+
for (const lineNum of this._decoratedLines) {
|
| 142 |
+
const el = document.getElementById(`cl-${lineNum}`);
|
| 143 |
+
if (el) el.classList.remove("hl-comment", "hl-issue", "hl-scanner", "hl-context");
|
| 144 |
+
}
|
| 145 |
+
this._decoratedLines.clear();
|
| 146 |
+
}
|
| 147 |
+
}
|
| 148 |
+
|
| 149 |
+
|
| 150 |
+
// βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 151 |
+
// RewardDashboard β ring + bars + issues progress
|
| 152 |
+
// βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 153 |
+
|
| 154 |
+
class RewardDashboard {
|
| 155 |
+
constructor() {
|
| 156 |
+
this._ringTrack = document.getElementById("ring-track");
|
| 157 |
+
this._ringValue = document.getElementById("ring-value");
|
| 158 |
+
this._issuesFill = document.getElementById("issues-bar-fill");
|
| 159 |
+
this._issuesLabel = document.getElementById("issues-found-label");
|
| 160 |
+
|
| 161 |
+
// Component bar element pairs { fill, val }
|
| 162 |
+
this._bars = {
|
| 163 |
+
issue_credit: this._barPair("issue_credit"),
|
| 164 |
+
classification_credit: this._barPair("classification_credit"),
|
| 165 |
+
false_positive_penalty:this._barPair("false_positive_penalty"),
|
| 166 |
+
coverage_bonus: this._barPair("coverage_bonus"),
|
| 167 |
+
decision_score: this._barPair("decision_score"),
|
| 168 |
+
efficiency_bonus: this._barPair("efficiency_bonus"),
|
| 169 |
+
};
|
| 170 |
+
}
|
| 171 |
+
|
| 172 |
+
_barPair(key) {
|
| 173 |
+
return {
|
| 174 |
+
fill: document.getElementById(`bar-${key}`),
|
| 175 |
+
val: document.getElementById(`val-${key}`),
|
| 176 |
+
};
|
| 177 |
+
}
|
| 178 |
+
|
| 179 |
+
/**
|
| 180 |
+
* Update the cumulative reward ring.
|
| 181 |
+
* Clamps input to [-1, 1] and maps to ring arc.
|
| 182 |
+
*/
|
| 183 |
+
updateRing(cumulativeReward) {
|
| 184 |
+
const clamped = Math.max(-1, Math.min(1, cumulativeReward));
|
| 185 |
+
// Map [-1, 1] β [0, circumference]: negative reward still shows a partial arc
|
| 186 |
+
const fraction = (clamped + 1) / 2;
|
| 187 |
+
const offset = CONFIG.ringCircumference * (1 - fraction);
|
| 188 |
+
|
| 189 |
+
this._ringTrack.style.strokeDashoffset = offset;
|
| 190 |
+
// Colour: green above 0, red below
|
| 191 |
+
this._ringTrack.style.stroke = clamped >= 0 ? "var(--green)" : "var(--red)";
|
| 192 |
+
this._ringValue.textContent = clamped.toFixed(2);
|
| 193 |
+
this._ringValue.style.color = clamped >= 0 ? "var(--green)" : "var(--red)";
|
| 194 |
+
}
|
| 195 |
+
|
| 196 |
+
/**
|
| 197 |
+
* Render per-component score bars from a components dict.
|
| 198 |
+
* The bar width maps the absolute value to a 0-100% scale capped at 0.40.
|
| 199 |
+
*/
|
| 200 |
+
updateBars(components) {
|
| 201 |
+
const MAX_BAR_VALUE = 0.40;
|
| 202 |
+
|
| 203 |
+
for (const [key, pair] of Object.entries(this._bars)) {
|
| 204 |
+
const rawValue = components[key] ?? 0;
|
| 205 |
+
const absWidth = Math.min(Math.abs(rawValue) / MAX_BAR_VALUE * 100, 100);
|
| 206 |
+
|
| 207 |
+
pair.fill.style.width = `${absWidth}%`;
|
| 208 |
+
pair.val.textContent = rawValue.toFixed(2);
|
| 209 |
+
|
| 210 |
+
// Positive/negative/neutral colouring
|
| 211 |
+
pair.fill.classList.remove("positive", "negative", "neutral");
|
| 212 |
+
if (rawValue > 0) pair.fill.classList.add("positive");
|
| 213 |
+
else if (rawValue < 0) pair.fill.classList.add("negative");
|
| 214 |
+
else pair.fill.classList.add("neutral");
|
| 215 |
+
}
|
| 216 |
+
}
|
| 217 |
+
|
| 218 |
+
/** Update the issues-found progress bar. */
|
| 219 |
+
updateIssues(found, total) {
|
| 220 |
+
const pct = total > 0 ? (found / total) * 100 : 0;
|
| 221 |
+
this._issuesFill.style.width = `${pct}%`;
|
| 222 |
+
this._issuesLabel.textContent = `${found} / ${total}`;
|
| 223 |
+
}
|
| 224 |
+
|
| 225 |
+
reset() {
|
| 226 |
+
this.updateRing(0);
|
| 227 |
+
this.updateBars({});
|
| 228 |
+
this.updateIssues(0, 0);
|
| 229 |
+
}
|
| 230 |
+
}
|
| 231 |
+
|
| 232 |
+
|
| 233 |
+
// βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 234 |
+
// HistoryFeed β append-only episode action log
|
| 235 |
+
// βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 236 |
+
|
| 237 |
+
class HistoryFeed {
|
| 238 |
+
constructor(containerEl) {
|
| 239 |
+
this._container = containerEl;
|
| 240 |
+
this._count = 0;
|
| 241 |
+
}
|
| 242 |
+
|
| 243 |
+
clear() {
|
| 244 |
+
this._container.innerHTML = '<div class="history-empty">No actions yet.</div>';
|
| 245 |
+
this._count = 0;
|
| 246 |
+
}
|
| 247 |
+
|
| 248 |
+
/**
|
| 249 |
+
* Append one step to the feed.
|
| 250 |
+
* @param {string} actionType Human-readable action label
|
| 251 |
+
* @param {object} reward RewardType object from server
|
| 252 |
+
*/
|
| 253 |
+
append(actionType, reward) {
|
| 254 |
+
if (this._count === 0) {
|
| 255 |
+
this._container.innerHTML = "";
|
| 256 |
+
}
|
| 257 |
+
this._count++;
|
| 258 |
+
|
| 259 |
+
const total = reward.total ?? 0;
|
| 260 |
+
const polarity = total > 0.001 ? "positive" : total < -0.001 ? "negative" : "neutral";
|
| 261 |
+
const rewardClass = total >= 0 ? "pos" : "neg";
|
| 262 |
+
const sign = total >= 0 ? "+" : "";
|
| 263 |
+
|
| 264 |
+
const item = document.createElement("div");
|
| 265 |
+
item.className = `history-item ${polarity}`;
|
| 266 |
+
item.innerHTML = `
|
| 267 |
+
<div>
|
| 268 |
+
<span class="h-action">${escapeHtml(actionType)}</span>
|
| 269 |
+
β
|
| 270 |
+
<span class="h-reward ${rewardClass}">${sign}${total.toFixed(3)}</span>
|
| 271 |
+
</div>
|
| 272 |
+
<div class="h-explain">${escapeHtml(reward.explanation ?? "")}</div>
|
| 273 |
+
`;
|
| 274 |
+
this._container.prepend(item); // newest at top
|
| 275 |
+
}
|
| 276 |
+
}
|
| 277 |
+
|
| 278 |
+
|
| 279 |
+
// βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 280 |
+
// ProbeController β owns all state, wires UI β WsClient
|
| 281 |
+
// βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 282 |
+
|
| 283 |
+
class ProbeController {
|
| 284 |
+
constructor() {
|
| 285 |
+
// Sub-components
|
| 286 |
+
this._ws = null;
|
| 287 |
+
this._viewer = new CodeViewer(document.getElementById("code-block"));
|
| 288 |
+
this._dashboard = new RewardDashboard();
|
| 289 |
+
this._feed = new HistoryFeed(document.getElementById("history-feed"));
|
| 290 |
+
|
| 291 |
+
// Episode state
|
| 292 |
+
this._episodeActive = false;
|
| 293 |
+
this._cumulativeReward = 0;
|
| 294 |
+
this._stepCount = 0;
|
| 295 |
+
this._maxSteps = 0;
|
| 296 |
+
this._totalIssues = 0;
|
| 297 |
+
this._foundCount = 0;
|
| 298 |
+
this._lastObs = null;
|
| 299 |
+
|
| 300 |
+
this._bindStaticButtons();
|
| 301 |
+
}
|
| 302 |
+
|
| 303 |
+
// ββ Initialisation ββββββββββββββββββββββββββββββββββββββββββββββ
|
| 304 |
+
|
| 305 |
+
_bindStaticButtons() {
|
| 306 |
+
document.getElementById("btn-connect").addEventListener("click", () => this._connect());
|
| 307 |
+
document.getElementById("btn-reset").addEventListener("click", () => this._sendReset());
|
| 308 |
+
document.getElementById("btn-comment").addEventListener("click", () => this._sendComment());
|
| 309 |
+
document.getElementById("btn-get-context").addEventListener("click", () => this._sendGetContext());
|
| 310 |
+
document.getElementById("btn-run-scanner").addEventListener("click", () => this._sendAction("run_scanner"));
|
| 311 |
+
document.getElementById("btn-request-changes").addEventListener("click", () => this._sendAction("request_changes"));
|
| 312 |
+
document.getElementById("btn-approve").addEventListener("click", () => this._sendAction("approve"));
|
| 313 |
+
document.getElementById("btn-submit").addEventListener("click", () => this._sendAction("submit_review"));
|
| 314 |
+
document.getElementById("btn-escalate").addEventListener("click",() => this._sendAction("escalate_to_security_review"));
|
| 315 |
+
document.getElementById("modal-close").addEventListener("click", () => {
|
| 316 |
+
document.getElementById("modal-overlay").style.display = "none";
|
| 317 |
+
this._sendReset();
|
| 318 |
+
});
|
| 319 |
+
}
|
| 320 |
+
|
| 321 |
+
// ββ WebSocket lifecycle ββββββββββββββββββββββββββββββββββββββββββ
|
| 322 |
+
|
| 323 |
+
_connect() {
|
| 324 |
+
this._ws = new WsClient(
|
| 325 |
+
CONFIG.wsUrl,
|
| 326 |
+
(msg) => this._handleMessage(msg),
|
| 327 |
+
(status) => this._handleConnectionStatus(status),
|
| 328 |
+
);
|
| 329 |
+
this._ws.connect();
|
| 330 |
+
}
|
| 331 |
+
|
| 332 |
+
_handleConnectionStatus(status) {
|
| 333 |
+
const badge = document.getElementById("conn-badge");
|
| 334 |
+
const btnReset = document.getElementById("btn-reset");
|
| 335 |
+
const btnConnect = document.getElementById("btn-connect");
|
| 336 |
+
|
| 337 |
+
if (status === "connected") {
|
| 338 |
+
badge.textContent = "π’ Connected";
|
| 339 |
+
badge.className = "badge connected";
|
| 340 |
+
btnConnect.textContent = "Reconnect";
|
| 341 |
+
btnReset.disabled = false;
|
| 342 |
+
// Auto-start first episode on successful connect
|
| 343 |
+
this._sendReset();
|
| 344 |
+
} else {
|
| 345 |
+
badge.textContent = "β« Disconnected";
|
| 346 |
+
badge.className = "badge disconnected";
|
| 347 |
+
this._setActionButtonsEnabled(false);
|
| 348 |
+
}
|
| 349 |
+
}
|
| 350 |
+
|
| 351 |
+
// ββ Message dispatch βββββββββββββββββββββββββββββββββββββββββββββ
|
| 352 |
+
|
| 353 |
+
_handleMessage(msg) {
|
| 354 |
+
switch (msg.type) {
|
| 355 |
+
case "reset": this._applyObservation(msg.observation, null, false); break;
|
| 356 |
+
case "step": this._applyStep(msg); break;
|
| 357 |
+
case "error": this._showError(msg.detail); break;
|
| 358 |
+
default: console.warn("[ProbeController] unknown message type:", msg.type);
|
| 359 |
+
}
|
| 360 |
+
}
|
| 361 |
+
|
| 362 |
+
// ββ Episode state application ββββββββββββββββββββββββββββββββββββ
|
| 363 |
+
|
| 364 |
+
/**
|
| 365 |
+
* Apply a fresh observation (after reset or step).
|
| 366 |
+
* Updates every UI component from the single observation object.
|
| 367 |
+
*/
|
| 368 |
+
_applyObservation(obs, reward, done) {
|
| 369 |
+
this._lastObs = obs;
|
| 370 |
+
this._stepCount = obs.step_count;
|
| 371 |
+
this._maxSteps = obs.max_steps;
|
| 372 |
+
this._totalIssues = obs.total_issues;
|
| 373 |
+
this._foundCount = obs.issues_found_count;
|
| 374 |
+
|
| 375 |
+
// ββ Task metadata ββ
|
| 376 |
+
document.getElementById("task-label").textContent =
|
| 377 |
+
`Task ${obs.task_id} β ${obs.file_name}`;
|
| 378 |
+
document.getElementById("task-desc").textContent = obs.task_description;
|
| 379 |
+
document.getElementById("steps-counter").textContent =
|
| 380 |
+
`Step ${obs.step_count} / ${obs.max_steps}`;
|
| 381 |
+
|
| 382 |
+
const diffBadge = document.getElementById("difficulty-badge");
|
| 383 |
+
diffBadge.textContent = obs.task_difficulty;
|
| 384 |
+
diffBadge.className = `difficulty-badge ${obs.task_difficulty.replace(/\s+/g, "-")}`;
|
| 385 |
+
|
| 386 |
+
// ββ Adversarial hint ββ
|
| 387 |
+
const advEl = document.getElementById("adv-hint");
|
| 388 |
+
if (obs.adversarial_hint) {
|
| 389 |
+
advEl.textContent = `β οΈ ${obs.adversarial_hint}`;
|
| 390 |
+
advEl.style.display = "block";
|
| 391 |
+
} else {
|
| 392 |
+
advEl.style.display = "none";
|
| 393 |
+
}
|
| 394 |
+
|
| 395 |
+
// ββ Code viewer ββ (only re-render if code changed, i.e. on reset)
|
| 396 |
+
if (!reward) {
|
| 397 |
+
this._viewer.render(obs.code_snippet);
|
| 398 |
+
this._viewer.clearDecorations();
|
| 399 |
+
}
|
| 400 |
+
|
| 401 |
+
// ββ Highlight lines mentioned in review history ββ
|
| 402 |
+
this._decorateHistoryLines(obs.review_history);
|
| 403 |
+
|
| 404 |
+
// ββ Context hints ββ
|
| 405 |
+
this._renderHints(obs.context_hints);
|
| 406 |
+
|
| 407 |
+
// ββ Dashboard ββ
|
| 408 |
+
this._cumulativeReward = obs.metadata?.cumulative_reward ?? 0;
|
| 409 |
+
this._dashboard.updateRing(this._cumulativeReward);
|
| 410 |
+
this._dashboard.updateIssues(this._foundCount, this._totalIssues);
|
| 411 |
+
|
| 412 |
+
if (reward) {
|
| 413 |
+
this._dashboard.updateBars(reward.components ?? {});
|
| 414 |
+
this._feed.append(this._lastActionLabel, reward);
|
| 415 |
+
}
|
| 416 |
+
|
| 417 |
+
// ββ Terminal handling ββ
|
| 418 |
+
if (done) {
|
| 419 |
+
this._episodeActive = false;
|
| 420 |
+
this._setActionButtonsEnabled(false);
|
| 421 |
+
this._showEpisodeEndModal(obs, reward);
|
| 422 |
+
} else {
|
| 423 |
+
this._episodeActive = true;
|
| 424 |
+
this._setActionButtonsEnabled(true);
|
| 425 |
+
}
|
| 426 |
+
}
|
| 427 |
+
|
| 428 |
+
_applyStep(msg) {
|
| 429 |
+
this._applyObservation(msg.observation, msg.reward, msg.done);
|
| 430 |
+
}
|
| 431 |
+
|
| 432 |
+
// ββ Line decorations βββββββββββββββββββββββββββββββββββββββββββββ
|
| 433 |
+
|
| 434 |
+
/**
|
| 435 |
+
* Walk review_history and apply colour-coded line highlights.
|
| 436 |
+
* Later entries overwrite earlier ones on the same line, so the most
|
| 437 |
+
* recent action's highlight takes priority.
|
| 438 |
+
*/
|
| 439 |
+
_decorateHistoryLines(history) {
|
| 440 |
+
this._viewer.clearDecorations();
|
| 441 |
+
for (const entry of history) {
|
| 442 |
+
if (!entry.line) continue;
|
| 443 |
+
let cssClass = "hl-comment";
|
| 444 |
+
if (entry.type === "scanner_result") continue; // no single line
|
| 445 |
+
if (entry.type === "context_probe") cssClass = "hl-context";
|
| 446 |
+
if (entry.type === "comment") cssClass = "hl-comment";
|
| 447 |
+
this._viewer.decorateLine(entry.line, cssClass);
|
| 448 |
+
}
|
| 449 |
+
}
|
| 450 |
+
|
| 451 |
+
// ββ Hints ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 452 |
+
|
| 453 |
+
_renderHints(hints) {
|
| 454 |
+
const container = document.getElementById("hints-container");
|
| 455 |
+
const list = document.getElementById("hints-list");
|
| 456 |
+
|
| 457 |
+
if (!hints || hints.length === 0) {
|
| 458 |
+
container.style.display = "none";
|
| 459 |
+
return;
|
| 460 |
+
}
|
| 461 |
+
container.style.display = "block";
|
| 462 |
+
list.innerHTML = hints.map(h =>
|
| 463 |
+
`<div class="hint-item">${escapeHtml(h)}</div>`
|
| 464 |
+
).join("");
|
| 465 |
+
}
|
| 466 |
+
|
| 467 |
+
// ββ Action senders βββββββββββββββββββββββββββββββββββββββββββββββ
|
| 468 |
+
|
| 469 |
+
_sendReset() {
|
| 470 |
+
if (!this._ws?.isConnected) return;
|
| 471 |
+
this._episodeActive = false;
|
| 472 |
+
this._setActionButtonsEnabled(false);
|
| 473 |
+
this._dashboard.reset();
|
| 474 |
+
this._feed.clear();
|
| 475 |
+
this._viewer._pre.innerHTML = '<span class="placeholder-text">Loadingβ¦</span>';
|
| 476 |
+
document.getElementById("hints-container").style.display = "none";
|
| 477 |
+
document.getElementById("adv-hint").style.display = "none";
|
| 478 |
+
this._ws.send({ command: "reset" });
|
| 479 |
+
}
|
| 480 |
+
|
| 481 |
+
_sendComment() {
|
| 482 |
+
const line = parseInt(document.getElementById("inp-line").value, 10) || null;
|
| 483 |
+
const comment = document.getElementById("inp-comment").value.trim();
|
| 484 |
+
const severity = document.getElementById("inp-severity").value || null;
|
| 485 |
+
const category = document.getElementById("inp-category").value || null;
|
| 486 |
+
const classification = document.getElementById("inp-classification").value || null;
|
| 487 |
+
|
| 488 |
+
if (!comment) {
|
| 489 |
+
alert("Please enter a comment before submitting.");
|
| 490 |
+
return;
|
| 491 |
+
}
|
| 492 |
+
this._lastActionLabel = `ADD_COMMENT (L${line ?? "?"})`;
|
| 493 |
+
this._sendAction("add_comment", {
|
| 494 |
+
line_number: line,
|
| 495 |
+
comment,
|
| 496 |
+
severity,
|
| 497 |
+
category,
|
| 498 |
+
classification,
|
| 499 |
+
});
|
| 500 |
+
// Clear comment fields after send
|
| 501 |
+
document.getElementById("inp-comment").value = "";
|
| 502 |
+
}
|
| 503 |
+
|
| 504 |
+
_sendGetContext() {
|
| 505 |
+
const line = parseInt(document.getElementById("inp-probe-line").value, 10) || null;
|
| 506 |
+
if (!line) { alert("Enter a line number to probe."); return; }
|
| 507 |
+
this._lastActionLabel = `GET_CONTEXT (L${line})`;
|
| 508 |
+
this._sendAction("get_context", { line_number: line });
|
| 509 |
+
}
|
| 510 |
+
|
| 511 |
+
/**
|
| 512 |
+
* Send a step action to the server.
|
| 513 |
+
* @param {string} actionType snake_case action type string
|
| 514 |
+
* @param {object} extra Additional fields (line_number, comment, β¦)
|
| 515 |
+
*/
|
| 516 |
+
_sendAction(actionType, extra = {}) {
|
| 517 |
+
if (!this._ws?.isConnected || !this._episodeActive) return;
|
| 518 |
+
this._lastActionLabel = actionType.toUpperCase().replace(/_/g, " ");
|
| 519 |
+
this._ws.send({
|
| 520 |
+
command: "step",
|
| 521 |
+
action: { action_type: actionType, ...extra },
|
| 522 |
+
});
|
| 523 |
+
}
|
| 524 |
+
|
| 525 |
+
// ββ UI helpers βββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 526 |
+
|
| 527 |
+
_setActionButtonsEnabled(enabled) {
|
| 528 |
+
const ids = [
|
| 529 |
+
"btn-comment", "btn-get-context", "btn-run-scanner",
|
| 530 |
+
"btn-request-changes", "btn-approve", "btn-submit", "btn-escalate",
|
| 531 |
+
];
|
| 532 |
+
for (const id of ids) {
|
| 533 |
+
document.getElementById(id).disabled = !enabled;
|
| 534 |
+
}
|
| 535 |
+
}
|
| 536 |
+
|
| 537 |
+
_showEpisodeEndModal(obs, reward) {
|
| 538 |
+
const totalReward = this._cumulativeReward;
|
| 539 |
+
const passed = reward?.passed ?? false;
|
| 540 |
+
|
| 541 |
+
document.getElementById("modal-overlay").style.display = "flex";
|
| 542 |
+
document.getElementById("modal-icon").textContent =
|
| 543 |
+
totalReward >= 0.5 ? "π" : totalReward >= 0 ? "π" : "π";
|
| 544 |
+
document.getElementById("modal-title").textContent =
|
| 545 |
+
passed ? "Episode Passed!" : "Episode Complete";
|
| 546 |
+
document.getElementById("modal-body").textContent =
|
| 547 |
+
reward?.explanation ?? "Episode ended.";
|
| 548 |
+
|
| 549 |
+
// Render a small stats grid inside the modal
|
| 550 |
+
const decision = obs.metadata?.review_decision ?? "β";
|
| 551 |
+
const esc = obs.metadata?.escalation_required ? "Yes" : "No";
|
| 552 |
+
document.getElementById("modal-stats").innerHTML = `
|
| 553 |
+
<span class="stat-label">Cumulative reward</span>
|
| 554 |
+
<span class="stat-value">${totalReward.toFixed(3)}</span>
|
| 555 |
+
<span class="stat-label">Issues found</span>
|
| 556 |
+
<span class="stat-value">${obs.issues_found_count} / ${obs.total_issues}</span>
|
| 557 |
+
<span class="stat-label">Steps used</span>
|
| 558 |
+
<span class="stat-value">${obs.step_count} / ${obs.max_steps}</span>
|
| 559 |
+
<span class="stat-label">Decision</span>
|
| 560 |
+
<span class="stat-value">${decision}</span>
|
| 561 |
+
<span class="stat-label">Escalation required</span>
|
| 562 |
+
<span class="stat-value">${esc}</span>
|
| 563 |
+
`;
|
| 564 |
+
}
|
| 565 |
+
|
| 566 |
+
_showError(detail) {
|
| 567 |
+
console.error("[ProbeController] server error:", detail);
|
| 568 |
+
// Non-intrusive: just log and append to feed as a red entry
|
| 569 |
+
this._feed.append("ERROR", {
|
| 570 |
+
total: 0,
|
| 571 |
+
explanation: detail ?? "Unknown server error",
|
| 572 |
+
});
|
| 573 |
+
}
|
| 574 |
+
}
|
| 575 |
+
|
| 576 |
+
|
| 577 |
+
// βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 578 |
+
// Utilities
|
| 579 |
+
// βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 580 |
+
|
| 581 |
+
/** Escape HTML special chars to prevent XSS when inserting code/text. */
|
| 582 |
+
function escapeHtml(str) {
|
| 583 |
+
return String(str)
|
| 584 |
+
.replace(/&/g, "&")
|
| 585 |
+
.replace(/</g, "<")
|
| 586 |
+
.replace(/>/g, ">")
|
| 587 |
+
.replace(/"/g, """);
|
| 588 |
+
}
|
| 589 |
+
|
| 590 |
+
|
| 591 |
+
// βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 592 |
+
// Bootstrap
|
| 593 |
+
// βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 594 |
+
|
| 595 |
+
document.addEventListener("DOMContentLoaded", () => {
|
| 596 |
+
window._probe = new ProbeController();
|
| 597 |
+
});
|
frontend/index.html
ADDED
|
@@ -0,0 +1,212 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<!doctype html>
|
| 2 |
+
<html lang="en">
|
| 3 |
+
<head>
|
| 4 |
+
<meta charset="UTF-8" />
|
| 5 |
+
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
| 6 |
+
<title>PRobe β AI Code Review Training Environment</title>
|
| 7 |
+
<link rel="stylesheet" href="style.css" />
|
| 8 |
+
</head>
|
| 9 |
+
<body>
|
| 10 |
+
|
| 11 |
+
<!-- ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 12 |
+
TOP BAR
|
| 13 |
+
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ -->
|
| 14 |
+
<header class="topbar">
|
| 15 |
+
<div class="topbar-left">
|
| 16 |
+
<span class="logo">🔍 PRobe</span>
|
| 17 |
+
<span class="tagline">Adversarial Code Review β RL Training Environment</span>
|
| 18 |
+
</div>
|
| 19 |
+
<div class="topbar-right">
|
| 20 |
+
<span class="badge" id="conn-badge">β« Disconnected</span>
|
| 21 |
+
<button id="btn-connect" class="btn btn-primary">Connect</button>
|
| 22 |
+
<button id="btn-reset" class="btn btn-secondary" disabled>New Episode</button>
|
| 23 |
+
</div>
|
| 24 |
+
</header>
|
| 25 |
+
|
| 26 |
+
<!-- ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 27 |
+
MAIN LAYOUT β three columns
|
| 28 |
+
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ -->
|
| 29 |
+
<main class="layout">
|
| 30 |
+
|
| 31 |
+
<!-- ββ LEFT: Task meta + code viewer βββββββββββββββββββββββ -->
|
| 32 |
+
<section class="panel panel-code">
|
| 33 |
+
|
| 34 |
+
<div class="panel-header">
|
| 35 |
+
<span id="task-label">Task β</span>
|
| 36 |
+
<span class="difficulty-badge" id="difficulty-badge">β</span>
|
| 37 |
+
<span class="steps-counter" id="steps-counter">Step 0 / β</span>
|
| 38 |
+
</div>
|
| 39 |
+
|
| 40 |
+
<p class="task-desc" id="task-desc">Connect and start an episode to begin.</p>
|
| 41 |
+
|
| 42 |
+
<div class="adversarial-hint" id="adv-hint" style="display:none"></div>
|
| 43 |
+
|
| 44 |
+
<!-- Code block with line-number highlights -->
|
| 45 |
+
<div class="code-wrapper">
|
| 46 |
+
<pre id="code-block" class="code-block"><span class="placeholder-text">No code loaded.</span></pre>
|
| 47 |
+
</div>
|
| 48 |
+
|
| 49 |
+
<!-- Context hints revealed by finding key issues -->
|
| 50 |
+
<div id="hints-container" style="display:none">
|
| 51 |
+
<div class="section-title">π Unlocked Context Hints</div>
|
| 52 |
+
<div id="hints-list" class="hints-list"></div>
|
| 53 |
+
</div>
|
| 54 |
+
|
| 55 |
+
</section>
|
| 56 |
+
|
| 57 |
+
<!-- ββ CENTRE: Action panel βββββββββββββββββββββββββββββββββ -->
|
| 58 |
+
<section class="panel panel-action">
|
| 59 |
+
|
| 60 |
+
<div class="panel-header">Actions</div>
|
| 61 |
+
|
| 62 |
+
<!-- ADD_COMMENT form -->
|
| 63 |
+
<div class="action-card" id="card-comment">
|
| 64 |
+
<div class="action-title">π¬ Add Comment</div>
|
| 65 |
+
<div class="form-row">
|
| 66 |
+
<label>Line</label>
|
| 67 |
+
<input type="number" id="inp-line" min="1" placeholder="e.g. 12" />
|
| 68 |
+
</div>
|
| 69 |
+
<div class="form-row">
|
| 70 |
+
<label>Comment</label>
|
| 71 |
+
<textarea id="inp-comment" rows="3" placeholder="Describe the issue in detailβ¦"></textarea>
|
| 72 |
+
</div>
|
| 73 |
+
<div class="form-row">
|
| 74 |
+
<label>Severity</label>
|
| 75 |
+
<select id="inp-severity">
|
| 76 |
+
<option value="">β none β</option>
|
| 77 |
+
<option value="info">info</option>
|
| 78 |
+
<option value="warning">warning</option>
|
| 79 |
+
<option value="error">error</option>
|
| 80 |
+
<option value="critical">critical</option>
|
| 81 |
+
</select>
|
| 82 |
+
</div>
|
| 83 |
+
<div class="form-row">
|
| 84 |
+
<label>Category</label>
|
| 85 |
+
<select id="inp-category">
|
| 86 |
+
<option value="">β none β</option>
|
| 87 |
+
<option value="bug">bug</option>
|
| 88 |
+
<option value="security">security</option>
|
| 89 |
+
<option value="performance">performance</option>
|
| 90 |
+
<option value="style">style</option>
|
| 91 |
+
<option value="design">design</option>
|
| 92 |
+
</select>
|
| 93 |
+
</div>
|
| 94 |
+
<div class="form-row">
|
| 95 |
+
<label>Classification</label>
|
| 96 |
+
<select id="inp-classification">
|
| 97 |
+
<option value="">β none β</option>
|
| 98 |
+
<option value="accidental_bug">accidental_bug</option>
|
| 99 |
+
<option value="intentional_backdoor">intentional_backdoor</option>
|
| 100 |
+
</select>
|
| 101 |
+
</div>
|
| 102 |
+
<button class="btn btn-action" id="btn-comment" disabled>Submit Comment</button>
|
| 103 |
+
</div>
|
| 104 |
+
|
| 105 |
+
<!-- Quick actions -->
|
| 106 |
+
<div class="quick-actions">
|
| 107 |
+
<div class="action-title">β‘ Quick Actions</div>
|
| 108 |
+
|
| 109 |
+
<div class="form-row">
|
| 110 |
+
<label>Probe Line</label>
|
| 111 |
+
<input type="number" id="inp-probe-line" min="1" placeholder="e.g. 8" />
|
| 112 |
+
</div>
|
| 113 |
+
<button class="btn btn-action btn-info" id="btn-get-context" disabled>π Get Context</button>
|
| 114 |
+
<button class="btn btn-action btn-info" id="btn-run-scanner" disabled>π€ Run Scanner</button>
|
| 115 |
+
|
| 116 |
+
<div class="separator"></div>
|
| 117 |
+
|
| 118 |
+
<button class="btn btn-action btn-warn" id="btn-request-changes" disabled>π Request Changes</button>
|
| 119 |
+
<button class="btn btn-action btn-success" id="btn-approve" disabled>β
Approve PR</button>
|
| 120 |
+
<button class="btn btn-action btn-danger" id="btn-submit" disabled>π€ Submit Review</button>
|
| 121 |
+
<button class="btn btn-action btn-escalate" id="btn-escalate" disabled>π¨ Escalate to Security</button>
|
| 122 |
+
</div>
|
| 123 |
+
|
| 124 |
+
</section>
|
| 125 |
+
|
| 126 |
+
<!-- ββ RIGHT: Reward dashboard + history βββββββββββββββββββ -->
|
| 127 |
+
<section class="panel panel-reward">
|
| 128 |
+
|
| 129 |
+
<div class="panel-header">Reward Dashboard</div>
|
| 130 |
+
|
| 131 |
+
<!-- Cumulative reward ring -->
|
| 132 |
+
<div class="reward-ring-wrap">
|
| 133 |
+
<svg class="reward-ring" viewBox="0 0 120 120">
|
| 134 |
+
<circle class="ring-bg" cx="60" cy="60" r="50" />
|
| 135 |
+
<circle class="ring-track" cx="60" cy="60" r="50" id="ring-track" />
|
| 136 |
+
</svg>
|
| 137 |
+
<div class="ring-label">
|
| 138 |
+
<span id="ring-value">0.00</span>
|
| 139 |
+
<small>cumulative</small>
|
| 140 |
+
</div>
|
| 141 |
+
</div>
|
| 142 |
+
|
| 143 |
+
<!-- Per-step component bars -->
|
| 144 |
+
<div class="component-bars" id="component-bars">
|
| 145 |
+
<div class="section-title">Last Step Breakdown</div>
|
| 146 |
+
<div class="bar-row" id="bar-row-issue_credit">
|
| 147 |
+
<span class="bar-label">Issue credit</span>
|
| 148 |
+
<div class="bar-track"><div class="bar-fill positive" id="bar-issue_credit"></div></div>
|
| 149 |
+
<span class="bar-val" id="val-issue_credit">0.00</span>
|
| 150 |
+
</div>
|
| 151 |
+
<div class="bar-row" id="bar-row-classification_credit">
|
| 152 |
+
<span class="bar-label">Classification</span>
|
| 153 |
+
<div class="bar-track"><div class="bar-fill positive" id="bar-classification_credit"></div></div>
|
| 154 |
+
<span class="bar-val" id="val-classification_credit">0.00</span>
|
| 155 |
+
</div>
|
| 156 |
+
<div class="bar-row" id="bar-row-false_positive_penalty">
|
| 157 |
+
<span class="bar-label">FP penalty</span>
|
| 158 |
+
<div class="bar-track"><div class="bar-fill negative" id="bar-false_positive_penalty"></div></div>
|
| 159 |
+
<span class="bar-val" id="val-false_positive_penalty">0.00</span>
|
| 160 |
+
</div>
|
| 161 |
+
<div class="bar-row" id="bar-row-coverage_bonus">
|
| 162 |
+
<span class="bar-label">Coverage</span>
|
| 163 |
+
<div class="bar-track"><div class="bar-fill positive" id="bar-coverage_bonus"></div></div>
|
| 164 |
+
<span class="bar-val" id="val-coverage_bonus">0.00</span>
|
| 165 |
+
</div>
|
| 166 |
+
<div class="bar-row" id="bar-row-decision_score">
|
| 167 |
+
<span class="bar-label">Decision</span>
|
| 168 |
+
<div class="bar-track"><div class="bar-fill neutral" id="bar-decision_score"></div></div>
|
| 169 |
+
<span class="bar-val" id="val-decision_score">0.00</span>
|
| 170 |
+
</div>
|
| 171 |
+
<div class="bar-row" id="bar-row-efficiency_bonus">
|
| 172 |
+
<span class="bar-label">Efficiency</span>
|
| 173 |
+
<div class="bar-track"><div class="bar-fill positive" id="bar-efficiency_bonus"></div></div>
|
| 174 |
+
<span class="bar-val" id="val-efficiency_bonus">0.00</span>
|
| 175 |
+
</div>
|
| 176 |
+
</div>
|
| 177 |
+
|
| 178 |
+
<!-- Issues progress -->
|
| 179 |
+
<div class="section-title" style="margin-top:1rem">Issues Found</div>
|
| 180 |
+
<div class="issues-progress">
|
| 181 |
+
<div class="issues-bar-wrap">
|
| 182 |
+
<div class="issues-bar-fill" id="issues-bar-fill"></div>
|
| 183 |
+
</div>
|
| 184 |
+
<span id="issues-found-label">0 / 0</span>
|
| 185 |
+
</div>
|
| 186 |
+
|
| 187 |
+
<!-- Step-by-step history feed -->
|
| 188 |
+
<div class="section-title" style="margin-top:1rem">Episode History</div>
|
| 189 |
+
<div class="history-feed" id="history-feed">
|
| 190 |
+
<div class="history-empty">No actions yet.</div>
|
| 191 |
+
</div>
|
| 192 |
+
|
| 193 |
+
</section>
|
| 194 |
+
|
| 195 |
+
</main>
|
| 196 |
+
|
| 197 |
+
<!-- ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 198 |
+
EPISODE-END MODAL
|
| 199 |
+
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ -->
|
| 200 |
+
<div id="modal-overlay" class="modal-overlay" style="display:none">
|
| 201 |
+
<div class="modal">
|
| 202 |
+
<div class="modal-icon" id="modal-icon">π</div>
|
| 203 |
+
<h2 id="modal-title">Episode Complete</h2>
|
| 204 |
+
<p id="modal-body">β</p>
|
| 205 |
+
<div class="modal-stats" id="modal-stats"></div>
|
| 206 |
+
<button class="btn btn-primary" id="modal-close">Start New Episode</button>
|
| 207 |
+
</div>
|
| 208 |
+
</div>
|
| 209 |
+
|
| 210 |
+
<script src="app.js"></script>
|
| 211 |
+
</body>
|
| 212 |
+
</html>
|
frontend/style.css
ADDED
|
@@ -0,0 +1,391 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
/* βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 2 |
+
PRobe Dashboard β stylesheet
|
| 3 |
+
Design tokens: dark IDE theme, accent #4f9eff
|
| 4 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ */
|
| 5 |
+
|
| 6 |
+
/* ββ Reset & base βββββββββββββββββββββββββββββββββββββββββββ */
|
| 7 |
+
*, *::before, *::after { box-sizing: border-box; margin: 0; padding: 0; }
|
| 8 |
+
|
| 9 |
+
:root {
|
| 10 |
+
--bg-0: #0d1117; /* deepest background */
|
| 11 |
+
--bg-1: #161b22; /* panel background */
|
| 12 |
+
--bg-2: #21262d; /* card / input background */
|
| 13 |
+
--bg-3: #30363d; /* hover / border */
|
| 14 |
+
--text-main: #e6edf3;
|
| 15 |
+
--text-dim: #8b949e;
|
| 16 |
+
--accent: #4f9eff;
|
| 17 |
+
--green: #3fb950;
|
| 18 |
+
--red: #f85149;
|
| 19 |
+
--yellow: #d29922;
|
| 20 |
+
--orange: #db6d28;
|
| 21 |
+
--purple: #a371f7;
|
| 22 |
+
--radius: 8px;
|
| 23 |
+
--font-mono: 'JetBrains Mono', 'Fira Code', 'Consolas', monospace;
|
| 24 |
+
--font-ui: 'Inter', system-ui, sans-serif;
|
| 25 |
+
--topbar-h: 52px;
|
| 26 |
+
}
|
| 27 |
+
|
| 28 |
+
html, body {
|
| 29 |
+
height: 100%;
|
| 30 |
+
background: var(--bg-0);
|
| 31 |
+
color: var(--text-main);
|
| 32 |
+
font-family: var(--font-ui);
|
| 33 |
+
font-size: 14px;
|
| 34 |
+
line-height: 1.5;
|
| 35 |
+
}
|
| 36 |
+
|
| 37 |
+
/* ββ Top bar ββββββββββββββββββββββββββββββββββββββββββββββββ */
|
| 38 |
+
.topbar {
|
| 39 |
+
position: fixed;
|
| 40 |
+
top: 0; left: 0; right: 0;
|
| 41 |
+
height: var(--topbar-h);
|
| 42 |
+
background: var(--bg-1);
|
| 43 |
+
border-bottom: 1px solid var(--bg-3);
|
| 44 |
+
display: flex;
|
| 45 |
+
align-items: center;
|
| 46 |
+
justify-content: space-between;
|
| 47 |
+
padding: 0 1.25rem;
|
| 48 |
+
z-index: 100;
|
| 49 |
+
}
|
| 50 |
+
.topbar-left { display: flex; align-items: center; gap: 1rem; }
|
| 51 |
+
.logo { font-size: 1.15rem; font-weight: 700; color: var(--accent); }
|
| 52 |
+
.tagline { color: var(--text-dim); font-size: 0.8rem; }
|
| 53 |
+
.topbar-right { display: flex; align-items: center; gap: 0.75rem; }
|
| 54 |
+
|
| 55 |
+
.badge {
|
| 56 |
+
font-size: 0.78rem;
|
| 57 |
+
padding: 3px 10px;
|
| 58 |
+
border-radius: 12px;
|
| 59 |
+
background: var(--bg-2);
|
| 60 |
+
border: 1px solid var(--bg-3);
|
| 61 |
+
white-space: nowrap;
|
| 62 |
+
}
|
| 63 |
+
.badge.connected { color: var(--green); border-color: var(--green); }
|
| 64 |
+
.badge.disconnected { color: var(--text-dim); }
|
| 65 |
+
|
| 66 |
+
/* ββ Buttons ββββββββββββββββββββββββββββββββββββββββββββββββ */
|
| 67 |
+
.btn {
|
| 68 |
+
padding: 6px 16px;
|
| 69 |
+
border-radius: var(--radius);
|
| 70 |
+
border: 1px solid transparent;
|
| 71 |
+
font-size: 0.82rem;
|
| 72 |
+
font-weight: 600;
|
| 73 |
+
cursor: pointer;
|
| 74 |
+
transition: opacity 0.15s, background 0.15s;
|
| 75 |
+
}
|
| 76 |
+
.btn:disabled { opacity: 0.35; cursor: not-allowed; }
|
| 77 |
+
.btn-primary { background: var(--accent); color: #fff; border-color: var(--accent); }
|
| 78 |
+
.btn-secondary{ background: var(--bg-2); color: var(--text-main); border-color: var(--bg-3); }
|
| 79 |
+
.btn-action { width: 100%; margin-bottom: 0.4rem; background: var(--bg-2); color: var(--text-main); border-color: var(--bg-3); }
|
| 80 |
+
.btn-info { border-color: var(--accent); color: var(--accent); }
|
| 81 |
+
.btn-warn { border-color: var(--yellow); color: var(--yellow); }
|
| 82 |
+
.btn-success { border-color: var(--green); color: var(--green); }
|
| 83 |
+
.btn-danger { border-color: var(--red); color: var(--red); background: rgba(248,81,73,0.1); }
|
| 84 |
+
.btn-escalate { border-color: var(--purple); color: var(--purple); background: rgba(163,113,247,0.1); }
|
| 85 |
+
.btn:not(:disabled):hover { opacity: 0.82; }
|
| 86 |
+
|
| 87 |
+
/* ββ Main three-column layout βββββββββββββββββββββββββββββββ */
|
| 88 |
+
.layout {
|
| 89 |
+
display: grid;
|
| 90 |
+
grid-template-columns: 1fr 310px 310px;
|
| 91 |
+
grid-template-rows: calc(100vh - var(--topbar-h));
|
| 92 |
+
gap: 0;
|
| 93 |
+
margin-top: var(--topbar-h);
|
| 94 |
+
overflow: hidden;
|
| 95 |
+
}
|
| 96 |
+
|
| 97 |
+
/* ββ Generic panel ββββββββββββββββββββββββββββββββββββββββββ */
|
| 98 |
+
.panel {
|
| 99 |
+
background: var(--bg-1);
|
| 100 |
+
border-right: 1px solid var(--bg-3);
|
| 101 |
+
overflow-y: auto;
|
| 102 |
+
padding: 1rem;
|
| 103 |
+
display: flex;
|
| 104 |
+
flex-direction: column;
|
| 105 |
+
gap: 0.75rem;
|
| 106 |
+
}
|
| 107 |
+
.panel:last-child { border-right: none; }
|
| 108 |
+
.panel-header {
|
| 109 |
+
font-weight: 700;
|
| 110 |
+
font-size: 0.85rem;
|
| 111 |
+
color: var(--text-dim);
|
| 112 |
+
text-transform: uppercase;
|
| 113 |
+
letter-spacing: 0.06em;
|
| 114 |
+
display: flex;
|
| 115 |
+
align-items: center;
|
| 116 |
+
gap: 0.75rem;
|
| 117 |
+
flex-wrap: wrap;
|
| 118 |
+
}
|
| 119 |
+
.section-title {
|
| 120 |
+
font-size: 0.78rem;
|
| 121 |
+
font-weight: 600;
|
| 122 |
+
color: var(--text-dim);
|
| 123 |
+
text-transform: uppercase;
|
| 124 |
+
letter-spacing: 0.05em;
|
| 125 |
+
}
|
| 126 |
+
|
| 127 |
+
/* ββ Task metadata ββββββββββββββββββββββββββββββββββββββββββ */
|
| 128 |
+
#task-label { color: var(--accent); font-size: 0.9rem; }
|
| 129 |
+
|
| 130 |
+
.difficulty-badge {
|
| 131 |
+
font-size: 0.72rem;
|
| 132 |
+
padding: 2px 8px;
|
| 133 |
+
border-radius: 10px;
|
| 134 |
+
background: var(--bg-2);
|
| 135 |
+
border: 1px solid var(--bg-3);
|
| 136 |
+
text-transform: capitalize;
|
| 137 |
+
}
|
| 138 |
+
.difficulty-badge.ultra-easy { color: var(--green); border-color: var(--green); }
|
| 139 |
+
.difficulty-badge.easy { color: var(--accent); border-color: var(--accent); }
|
| 140 |
+
.difficulty-badge.medium { color: var(--yellow); border-color: var(--yellow); }
|
| 141 |
+
.difficulty-badge.hard { color: var(--orange); border-color: var(--orange); }
|
| 142 |
+
.difficulty-badge.adversarial{ color: var(--red); border-color: var(--red); }
|
| 143 |
+
|
| 144 |
+
.steps-counter { margin-left: auto; font-size: 0.8rem; color: var(--text-dim); }
|
| 145 |
+
|
| 146 |
+
.task-desc {
|
| 147 |
+
font-size: 0.82rem;
|
| 148 |
+
color: var(--text-dim);
|
| 149 |
+
line-height: 1.6;
|
| 150 |
+
background: var(--bg-2);
|
| 151 |
+
border: 1px solid var(--bg-3);
|
| 152 |
+
border-radius: var(--radius);
|
| 153 |
+
padding: 0.6rem 0.8rem;
|
| 154 |
+
}
|
| 155 |
+
|
| 156 |
+
.adversarial-hint {
|
| 157 |
+
font-size: 0.8rem;
|
| 158 |
+
background: rgba(163,113,247,0.1);
|
| 159 |
+
border: 1px solid var(--purple);
|
| 160 |
+
border-radius: var(--radius);
|
| 161 |
+
padding: 0.5rem 0.75rem;
|
| 162 |
+
color: var(--purple);
|
| 163 |
+
}
|
| 164 |
+
|
| 165 |
+
/* ββ Code viewer ββββββββββββββββββββββββββββββββββββββββββββ */
|
| 166 |
+
.code-wrapper {
|
| 167 |
+
flex: 1;
|
| 168 |
+
overflow: auto;
|
| 169 |
+
border: 1px solid var(--bg-3);
|
| 170 |
+
border-radius: var(--radius);
|
| 171 |
+
background: var(--bg-0);
|
| 172 |
+
}
|
| 173 |
+
.code-block {
|
| 174 |
+
font-family: var(--font-mono);
|
| 175 |
+
font-size: 0.78rem;
|
| 176 |
+
line-height: 1.65;
|
| 177 |
+
padding: 0.75rem 1rem;
|
| 178 |
+
white-space: pre;
|
| 179 |
+
counter-reset: line-counter;
|
| 180 |
+
}
|
| 181 |
+
.code-line { display: block; }
|
| 182 |
+
.code-line-num {
|
| 183 |
+
user-select: none;
|
| 184 |
+
display: inline-block;
|
| 185 |
+
width: 2.8em;
|
| 186 |
+
color: var(--text-dim);
|
| 187 |
+
text-align: right;
|
| 188 |
+
margin-right: 1em;
|
| 189 |
+
font-size: 0.72rem;
|
| 190 |
+
}
|
| 191 |
+
/* Highlighted lines (comment target or scanner finding) */
|
| 192 |
+
.code-line.hl-comment { background: rgba(79,158,255,0.12); border-left: 3px solid var(--accent); }
|
| 193 |
+
.code-line.hl-issue { background: rgba(248,81,73,0.10); border-left: 3px solid var(--red); }
|
| 194 |
+
.code-line.hl-scanner { background: rgba(210,153,34,0.10); border-left: 3px solid var(--yellow); }
|
| 195 |
+
.code-line.hl-context { background: rgba(63,185,80,0.08); border-left: 3px solid var(--green); }
|
| 196 |
+
|
| 197 |
+
.placeholder-text { color: var(--text-dim); font-style: italic; }
|
| 198 |
+
|
| 199 |
+
/* ββ Hints ββββββββββββββββββββββββββββββββββββββββββββββββββ */
|
| 200 |
+
.hints-list {
|
| 201 |
+
display: flex;
|
| 202 |
+
flex-direction: column;
|
| 203 |
+
gap: 0.4rem;
|
| 204 |
+
}
|
| 205 |
+
.hint-item {
|
| 206 |
+
font-size: 0.8rem;
|
| 207 |
+
background: rgba(63,185,80,0.08);
|
| 208 |
+
border: 1px solid var(--green);
|
| 209 |
+
border-radius: var(--radius);
|
| 210 |
+
padding: 0.5rem 0.75rem;
|
| 211 |
+
color: var(--text-main);
|
| 212 |
+
white-space: pre-wrap;
|
| 213 |
+
}
|
| 214 |
+
|
| 215 |
+
/* ββ Action cards βββββββββββββββββββββββββββββββββββββββββββ */
|
| 216 |
+
.action-card {
|
| 217 |
+
background: var(--bg-2);
|
| 218 |
+
border: 1px solid var(--bg-3);
|
| 219 |
+
border-radius: var(--radius);
|
| 220 |
+
padding: 0.8rem;
|
| 221 |
+
display: flex;
|
| 222 |
+
flex-direction: column;
|
| 223 |
+
gap: 0.5rem;
|
| 224 |
+
}
|
| 225 |
+
.action-title {
|
| 226 |
+
font-size: 0.8rem;
|
| 227 |
+
font-weight: 700;
|
| 228 |
+
color: var(--text-dim);
|
| 229 |
+
text-transform: uppercase;
|
| 230 |
+
letter-spacing: 0.05em;
|
| 231 |
+
margin-bottom: 0.25rem;
|
| 232 |
+
}
|
| 233 |
+
.form-row {
|
| 234 |
+
display: flex;
|
| 235 |
+
flex-direction: column;
|
| 236 |
+
gap: 3px;
|
| 237 |
+
}
|
| 238 |
+
.form-row label { font-size: 0.75rem; color: var(--text-dim); }
|
| 239 |
+
.form-row input,
|
| 240 |
+
.form-row select,
|
| 241 |
+
.form-row textarea {
|
| 242 |
+
background: var(--bg-0);
|
| 243 |
+
border: 1px solid var(--bg-3);
|
| 244 |
+
border-radius: 5px;
|
| 245 |
+
color: var(--text-main);
|
| 246 |
+
font-family: var(--font-ui);
|
| 247 |
+
font-size: 0.82rem;
|
| 248 |
+
padding: 5px 8px;
|
| 249 |
+
resize: vertical;
|
| 250 |
+
}
|
| 251 |
+
.form-row input:focus,
|
| 252 |
+
.form-row select:focus,
|
| 253 |
+
.form-row textarea:focus {
|
| 254 |
+
outline: none;
|
| 255 |
+
border-color: var(--accent);
|
| 256 |
+
}
|
| 257 |
+
|
| 258 |
+
.quick-actions {
|
| 259 |
+
background: var(--bg-2);
|
| 260 |
+
border: 1px solid var(--bg-3);
|
| 261 |
+
border-radius: var(--radius);
|
| 262 |
+
padding: 0.8rem;
|
| 263 |
+
display: flex;
|
| 264 |
+
flex-direction: column;
|
| 265 |
+
gap: 0.4rem;
|
| 266 |
+
}
|
| 267 |
+
.separator { height: 1px; background: var(--bg-3); margin: 0.3rem 0; }
|
| 268 |
+
|
| 269 |
+
/* ββ Reward ring ββββββββββββββββββββββββββββββββββββββββββββ */
|
| 270 |
+
.reward-ring-wrap {
|
| 271 |
+
position: relative;
|
| 272 |
+
width: 120px;
|
| 273 |
+
margin: 0 auto;
|
| 274 |
+
}
|
| 275 |
+
.reward-ring { width: 120px; height: 120px; transform: rotate(-90deg); }
|
| 276 |
+
.ring-bg { fill: none; stroke: var(--bg-2); stroke-width: 10; }
|
| 277 |
+
.ring-track {
|
| 278 |
+
fill: none;
|
| 279 |
+
stroke: var(--accent);
|
| 280 |
+
stroke-width: 10;
|
| 281 |
+
stroke-linecap: round;
|
| 282 |
+
stroke-dasharray: 314; /* 2Ο Γ r=50 */
|
| 283 |
+
stroke-dashoffset: 314;
|
| 284 |
+
transition: stroke-dashoffset 0.5s ease, stroke 0.5s ease;
|
| 285 |
+
}
|
| 286 |
+
.ring-label {
|
| 287 |
+
position: absolute;
|
| 288 |
+
inset: 0;
|
| 289 |
+
display: flex;
|
| 290 |
+
flex-direction: column;
|
| 291 |
+
align-items: center;
|
| 292 |
+
justify-content: center;
|
| 293 |
+
font-weight: 700;
|
| 294 |
+
font-size: 1.1rem;
|
| 295 |
+
}
|
| 296 |
+
.ring-label small { font-size: 0.65rem; color: var(--text-dim); font-weight: 400; }
|
| 297 |
+
|
| 298 |
+
/* ββ Component bar chart ββββββββββββββββββββοΏ½οΏ½βββββββββββββββ */
|
| 299 |
+
.component-bars { display: flex; flex-direction: column; gap: 6px; }
|
| 300 |
+
.bar-row { display: flex; align-items: center; gap: 6px; }
|
| 301 |
+
.bar-label { font-size: 0.72rem; color: var(--text-dim); width: 90px; flex-shrink: 0; }
|
| 302 |
+
.bar-track { flex: 1; height: 7px; background: var(--bg-2); border-radius: 4px; overflow: hidden; }
|
| 303 |
+
.bar-fill { height: 100%; border-radius: 4px; width: 0; transition: width 0.4s ease; }
|
| 304 |
+
.bar-fill.positive { background: var(--green); }
|
| 305 |
+
.bar-fill.negative { background: var(--red); }
|
| 306 |
+
.bar-fill.neutral { background: var(--yellow); }
|
| 307 |
+
.bar-val { font-size: 0.72rem; width: 36px; text-align: right; color: var(--text-dim); }
|
| 308 |
+
|
| 309 |
+
/* ββ Issues progress ββββββββββββββββββββββββββββββββββββββββ */
|
| 310 |
+
.issues-progress { display: flex; align-items: center; gap: 8px; }
|
| 311 |
+
.issues-bar-wrap {
|
| 312 |
+
flex: 1; height: 8px;
|
| 313 |
+
background: var(--bg-2);
|
| 314 |
+
border-radius: 4px;
|
| 315 |
+
overflow: hidden;
|
| 316 |
+
}
|
| 317 |
+
.issues-bar-fill {
|
| 318 |
+
height: 100%;
|
| 319 |
+
background: var(--accent);
|
| 320 |
+
border-radius: 4px;
|
| 321 |
+
width: 0;
|
| 322 |
+
transition: width 0.4s ease;
|
| 323 |
+
}
|
| 324 |
+
|
| 325 |
+
/* ββ History feed βββββββββββββββββββββββββββββββββββββββββββ */
|
| 326 |
+
.history-feed {
|
| 327 |
+
display: flex;
|
| 328 |
+
flex-direction: column;
|
| 329 |
+
gap: 0.4rem;
|
| 330 |
+
max-height: 320px;
|
| 331 |
+
overflow-y: auto;
|
| 332 |
+
}
|
| 333 |
+
.history-empty { color: var(--text-dim); font-size: 0.8rem; font-style: italic; }
|
| 334 |
+
.history-item {
|
| 335 |
+
background: var(--bg-2);
|
| 336 |
+
border: 1px solid var(--bg-3);
|
| 337 |
+
border-radius: 6px;
|
| 338 |
+
padding: 0.45rem 0.65rem;
|
| 339 |
+
font-size: 0.78rem;
|
| 340 |
+
border-left: 3px solid var(--bg-3);
|
| 341 |
+
}
|
| 342 |
+
.history-item.positive { border-left-color: var(--green); }
|
| 343 |
+
.history-item.negative { border-left-color: var(--red); }
|
| 344 |
+
.history-item.neutral { border-left-color: var(--yellow); }
|
| 345 |
+
.history-item .h-action { font-weight: 700; color: var(--accent); }
|
| 346 |
+
.history-item .h-reward { font-weight: 700; }
|
| 347 |
+
.history-item .h-reward.pos { color: var(--green); }
|
| 348 |
+
.history-item .h-reward.neg { color: var(--red); }
|
| 349 |
+
.history-item .h-explain { color: var(--text-dim); margin-top: 2px; line-height: 1.4; }
|
| 350 |
+
|
| 351 |
+
/* ββ Episode-end modal ββββββββββββββββββββββββββββββββββββββ */
|
| 352 |
+
.modal-overlay {
|
| 353 |
+
position: fixed; inset: 0;
|
| 354 |
+
background: rgba(0,0,0,0.7);
|
| 355 |
+
display: flex; align-items: center; justify-content: center;
|
| 356 |
+
z-index: 200;
|
| 357 |
+
}
|
| 358 |
+
.modal {
|
| 359 |
+
background: var(--bg-1);
|
| 360 |
+
border: 1px solid var(--bg-3);
|
| 361 |
+
border-radius: 12px;
|
| 362 |
+
padding: 2rem;
|
| 363 |
+
max-width: 440px;
|
| 364 |
+
width: 90%;
|
| 365 |
+
text-align: center;
|
| 366 |
+
display: flex;
|
| 367 |
+
flex-direction: column;
|
| 368 |
+
align-items: center;
|
| 369 |
+
gap: 0.75rem;
|
| 370 |
+
}
|
| 371 |
+
.modal-icon { font-size: 3rem; }
|
| 372 |
+
.modal h2 { font-size: 1.3rem; }
|
| 373 |
+
.modal p { color: var(--text-dim); font-size: 0.88rem; line-height: 1.6; }
|
| 374 |
+
.modal-stats {
|
| 375 |
+
width: 100%;
|
| 376 |
+
background: var(--bg-2);
|
| 377 |
+
border-radius: var(--radius);
|
| 378 |
+
padding: 0.75rem 1rem;
|
| 379 |
+
display: grid;
|
| 380 |
+
grid-template-columns: 1fr 1fr;
|
| 381 |
+
gap: 0.4rem 1rem;
|
| 382 |
+
text-align: left;
|
| 383 |
+
font-size: 0.82rem;
|
| 384 |
+
}
|
| 385 |
+
.modal-stats .stat-label { color: var(--text-dim); }
|
| 386 |
+
.modal-stats .stat-value { font-weight: 700; }
|
| 387 |
+
|
| 388 |
+
/* ββ Scrollbar styling ββββββββββββββββββββββββββββββββββββββ */
|
| 389 |
+
::-webkit-scrollbar { width: 6px; height: 6px; }
|
| 390 |
+
::-webkit-scrollbar-track { background: var(--bg-1); }
|
| 391 |
+
::-webkit-scrollbar-thumb { background: var(--bg-3); border-radius: 3px; }
|
outputs/baseline_comparison.svg
ADDED
|
|
outputs/reward_breakdown.svg
ADDED
|
|
run.py
ADDED
|
@@ -0,0 +1,65 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
PRobe β unified launcher.
|
| 3 |
+
|
| 4 |
+
Starts the FastAPI server which serves:
|
| 5 |
+
- The interactive frontend at http://localhost:8000/ui/
|
| 6 |
+
- The REST API at http://localhost:8000/docs
|
| 7 |
+
- The WebSocket at ws://localhost:8000/ws
|
| 8 |
+
|
| 9 |
+
Usage
|
| 10 |
+
-----
|
| 11 |
+
uv run python run.py # default: host=0.0.0.0, port=8000
|
| 12 |
+
uv run python run.py --port 9000
|
| 13 |
+
uv run python run.py --host 127.0.0.1 --port 8000
|
| 14 |
+
"""
|
| 15 |
+
from __future__ import annotations
|
| 16 |
+
|
| 17 |
+
import argparse
|
| 18 |
+
import pathlib
|
| 19 |
+
import sys
|
| 20 |
+
|
| 21 |
+
# ββ Path bootstrap ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 22 |
+
# Add the project root to sys.path so both `agent` and `environment` packages
|
| 23 |
+
# are importable regardless of how or from where this script is invoked.
|
| 24 |
+
PROJECT_ROOT = pathlib.Path(__file__).parent.resolve()
|
| 25 |
+
sys.path.insert(0, str(PROJECT_ROOT))
|
| 26 |
+
|
| 27 |
+
# ββ Now safe to import the app ββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 28 |
+
from environment.app import app # noqa: E402 (import after path setup)
|
| 29 |
+
import uvicorn # noqa: E402
|
| 30 |
+
|
| 31 |
+
|
| 32 |
+
def main() -> None:
|
| 33 |
+
parser = argparse.ArgumentParser(
|
| 34 |
+
description="Start the PRobe environment server + frontend",
|
| 35 |
+
formatter_class=argparse.ArgumentDefaultsHelpFormatter,
|
| 36 |
+
)
|
| 37 |
+
parser.add_argument("--host", default="0.0.0.0", help="Bind host")
|
| 38 |
+
parser.add_argument("--port", type=int, default=8000, help="Bind port")
|
| 39 |
+
parser.add_argument("--reload", action="store_true",
|
| 40 |
+
help="Enable auto-reload on code changes (dev mode)")
|
| 41 |
+
args = parser.parse_args()
|
| 42 |
+
|
| 43 |
+
frontend_url = f"http://{'localhost' if args.host == '0.0.0.0' else args.host}:{args.port}/ui/"
|
| 44 |
+
api_url = f"http://{'localhost' if args.host == '0.0.0.0' else args.host}:{args.port}/docs"
|
| 45 |
+
|
| 46 |
+
print("\n" + "=" * 58)
|
| 47 |
+
print(" PRobe β AI Code Review Training Environment")
|
| 48 |
+
print("=" * 58)
|
| 49 |
+
print(f" Frontend β {frontend_url}")
|
| 50 |
+
print(f" API docs β {api_url}")
|
| 51 |
+
print(f" WebSocket β ws://localhost:{args.port}/ws")
|
| 52 |
+
print("=" * 58 + "\n")
|
| 53 |
+
|
| 54 |
+
uvicorn.run(
|
| 55 |
+
"environment.app:app",
|
| 56 |
+
host=args.host,
|
| 57 |
+
port=args.port,
|
| 58 |
+
reload=args.reload,
|
| 59 |
+
# Keep uvicorn's own logging minimal so our banner stays visible
|
| 60 |
+
log_level="warning",
|
| 61 |
+
)
|
| 62 |
+
|
| 63 |
+
|
| 64 |
+
if __name__ == "__main__":
|
| 65 |
+
main()
|