Thakur, Mahipal commited on
Commit
44bd7bd
Β·
1 Parent(s): 4ec7361

UI Integration

Browse files
README.md CHANGED
@@ -6,7 +6,7 @@ colorTo: green
6
  sdk: docker
7
  pinned: false
8
  app_port: 8000
9
- base_path: /web
10
  tags:
11
  - openenv
12
  - code-review
@@ -197,11 +197,23 @@ Find missing rate-limit β†’ nginx config shown β†’ confirms /auth fully e
197
  ## Quickstart
198
 
199
  ```bash
200
- # Install
201
  uv sync
202
 
203
- # Run the environment server
204
- uv run uvicorn environment.app:app --host 0.0.0.0 --port 8000 --reload
 
 
 
 
 
 
 
 
 
 
 
 
205
 
206
  # Run zero-shot GPT-4o-mini baseline (requires OPENAI_API_KEY)
207
  export OPENAI_API_KEY=sk-...
@@ -213,6 +225,143 @@ uv run python training/train_grpo.py --test
213
 
214
  ---
215
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
216
  ## Training
217
 
218
  | | |
@@ -283,6 +432,256 @@ Security code review is a high-stakes task performed by a small number of specia
283
 
284
  ## Repo Structure
285
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
286
  ```
287
  .
288
  β”œβ”€β”€ agent/
 
6
  sdk: docker
7
  pinned: false
8
  app_port: 8000
9
+ base_path: /ui/
10
  tags:
11
  - openenv
12
  - code-review
 
197
  ## Quickstart
198
 
199
  ```bash
200
+ # 1. Install all dependencies
201
  uv sync
202
 
203
+ # 2. Start the server + frontend in one command
204
+ uv run python run.py
205
+
206
+ # The terminal will print:
207
+ # ==========================================================
208
+ # PRobe β€” AI Code Review Training Environment
209
+ # ==========================================================
210
+ # Frontend β†’ http://localhost:8000/ui/
211
+ # API docs β†’ http://localhost:8000/docs
212
+ # WebSocket β†’ ws://localhost:8000/ws
213
+ # ==========================================================
214
+
215
+ # 3. Open your browser
216
+ open http://localhost:8000/ui/
217
 
218
  # Run zero-shot GPT-4o-mini baseline (requires OPENAI_API_KEY)
219
  export OPENAI_API_KEY=sk-...
 
225
 
226
  ---
227
 
228
+ ## Interactive Frontend Dashboard
229
+
230
+ PRobe ships with a **zero-dependency browser UI** that turns the RL environment into a live, interactive demo.
231
+ No npm, no build step β€” just start the server and open your browser.
232
+
233
+ ### What It Looks Like
234
+
235
+ ```
236
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
237
+ β”‚ πŸ” PRobe Adversarial Code Review β€” RL Training Environment β”‚
238
+ β”‚ 🟒 Connected [New Ep] β”‚
239
+ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
240
+ β”‚ Task 2 β€” auth.py β”‚ Actions β”‚ Reward Dashboard β”‚
241
+ β”‚ medium β€’ Step 3 / 20 β”‚ β”‚ β”‚
242
+ β”‚ β”‚ πŸ’¬ Add Comment β”‚ β—― +0.24 β”‚
243
+ β”‚ ⚠️ External contributor, β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ cumulative β”‚
244
+ β”‚ no prior commit history β”‚ β”‚ Line: [12] β”‚ β”‚ β”‚
245
+ β”‚ β”‚ β”‚ Comment: β”‚ β”‚ Issue credit β–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘ β”‚
246
+ β”‚ Review this auth module. β”‚ β”‚ SQL inject.. β”‚ β”‚ Classification β–ˆβ–ˆβ–‘β–‘β–‘ β”‚
247
+ β”‚ Identify bugs and decide β”‚ β”‚ Severity: β”‚ β”‚ FP penalty β–‘β–‘β–‘β–‘β–‘ β”‚
248
+ β”‚ whether to escalate or β”‚ β”‚ [critical β–Ύ] β”‚ β”‚ Coverage β–ˆβ–ˆβ–ˆβ–‘β–‘ β”‚
249
+ β”‚ request changes. β”‚ β”‚ Category: β”‚ β”‚ Decision β–ˆβ–ˆβ–ˆβ–ˆβ–‘ β”‚
250
+ β”‚ β”‚ β”‚ [security β–Ύ] β”‚ β”‚ Efficiency β–ˆβ–ˆβ–‘β–‘β–‘ β”‚
251
+ β”‚ β”Œβ”€ auth.py ──────────────┐ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚
252
+ β”‚ β”‚ 1: import hashlib β”‚ β”‚ [Submit Comment] β”‚ Issues Found β”‚
253
+ β”‚ β”‚ 2: β”‚ β”‚ β”‚ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘ 2 / 5 β”‚
254
+ β”‚ β”‚ 3: DB_PASS = "s3cr" β”‚ β”‚ ⚑ Quick Actions β”‚ β”‚
255
+ β”‚ β”‚ 12: cursor.execute( │◄── [πŸ” Get Context] β”‚ Episode History β”‚
256
+ β”‚ β”‚ f"SELECT * FROM β”‚ β”‚ [πŸ€– Run Scanner] β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
257
+ β”‚ β”‚ users WHERE β”‚ β”‚ ─────────────── β”‚ β”‚ ADD_COMMENT +0.12 β”‚ β”‚
258
+ β”‚ β”‚ 13: username='{u}'" β”‚ β”‚ [πŸ”„ Req Changes] β”‚ β”‚ sql injection L12 β”‚ β”‚
259
+ β”‚ β”‚ 14: ) β”‚ β”‚ [βœ… Approve PR] β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚
260
+ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ [πŸ“€ Submit] β”‚ β”‚ RUN_SCANNER +0.00 β”‚ β”‚
261
+ β”‚ β”‚ [🚨 Escalate] β”‚ β”‚ 3 findings found β”‚ β”‚
262
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
263
+ ```
264
+
265
+ ### Three-Column Layout
266
+
267
+ **Left β€” Code Viewer**
268
+ - Full source code with **line numbers** for every episode
269
+ - Lines are **colour-coded** as you act:
270
+ - πŸ”΅ Blue β€” line you just commented on
271
+ - 🟑 Yellow β€” line flagged by the scanner
272
+ - 🟒 Green β€” line you probed with Get Context
273
+ - **Unlocked hints** appear below the code as green panels whenever a key issue is found
274
+ - The **adversarial hint** banner tells you whether the PR is from a trusted team member or an unknown external contributor
275
+
276
+ **Centre β€” Action Panel**
277
+ - **Add Comment** form: line number, free-text comment, severity, category, and bug/backdoor classification
278
+ - **Quick Actions**: single-click buttons for all 7 action types
279
+
280
+ | Button | Action | What Happens |
281
+ |---|---|---|
282
+ | πŸ” Get Context | `get_context` | Reveals Β±5 lines around the probed line number |
283
+ | πŸ€– Run Scanner | `run_scanner` | Runs the simulated static-analysis tool |
284
+ | πŸ”„ Request Changes | `request_changes` | Records your review decision |
285
+ | βœ… Approve PR | `approve` | Approves (βˆ’0.15 penalty if < 50 % issues found) |
286
+ | πŸ“€ Submit Review | `submit_review` | Ends the episode; triggers terminal scoring |
287
+ | 🚨 Escalate to Security | `escalate_to_security_review` | Correct only on adversarial tasks 7–9 |
288
+
289
+ **Right β€” Reward Dashboard**
290
+ - **Animated ring** showing cumulative episode reward (green above zero, red below)
291
+ - **Six component bars** updating in real time after every action:
292
+ - Issue credit, Classification credit, FP penalty
293
+ - Coverage bonus, Decision score, Efficiency bonus
294
+ - **Issues progress bar** showing how many ground-truth issues you have found
295
+ - **Episode history feed** β€” every action with its reward delta and explanation
296
+
297
+ ### Episode End Modal
298
+
299
+ When the episode terminates (via Submit Review or Escalate), a modal pops up showing:
300
+
301
+ ```
302
+ πŸ† Episode Passed!
303
+
304
+ "Found 5/5 issues (weighted coverage 100%).
305
+ Decision 'escalate_to_security_review' was correct."
306
+
307
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
308
+ β”‚ Cumulative reward +0.874 β”‚
309
+ β”‚ Issues found 5 / 5 β”‚
310
+ β”‚ Steps used 18 / 25 β”‚
311
+ β”‚ Decision escalate β”‚
312
+ β”‚ Escalation required Yes β”‚
313
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
314
+
315
+ [Start New Episode]
316
+ ```
317
+
318
+ Clicking **Start New Episode** automatically loads the next task in the difficulty ladder.
319
+
320
+ ### How to Run
321
+
322
+ ```bash
323
+ # Install dependencies (one-time)
324
+ uv sync
325
+
326
+ # Start the server β€” this also serves the frontend
327
+ uv run python run.py
328
+ ```
329
+
330
+ Then open **`http://localhost:8000/ui/`** in any browser. No additional setup, no separate frontend server.
331
+
332
+ **Optional flags:**
333
+
334
+ ```bash
335
+ # Different port
336
+ uv run python run.py --port 9000
337
+
338
+ # Bind to localhost only (do not expose on the network)
339
+ uv run python run.py --host 127.0.0.1
340
+
341
+ # Dev mode: auto-reload Python files on save
342
+ uv run python run.py --reload
343
+ ```
344
+
345
+ ### How the Frontend Connects
346
+
347
+ The browser communicates with the backend over a **persistent WebSocket** at `ws://localhost:8000/ws`.
348
+ Each browser tab gets its own isolated environment instance β€” concurrent sessions do not share state.
349
+ The WebSocket URL is auto-detected from `window.location.hostname` so the UI works on any host or port without editing any file.
350
+
351
+ ### Why a Frontend Helps the Story
352
+
353
+ | Without Frontend | With Frontend |
354
+ |---|---|
355
+ | `total=0.345` in a log file | Animated reward ring filling green in real time |
356
+ | `issues_found: ['sql_injection']` | Line 12 highlighted blue in the code viewer |
357
+ | `decision: escalate_to_security_review` | 🚨 Escalate button, modal with final score and stats |
358
+ | Understanding the anti-exploit rule | Watching a keyword-spam comment score βˆ’0.05 FP penalty |
359
+ | Explaining the causal chain mechanic | Green hint panel appearing after finding the JWT issue |
360
+
361
+ The dashboard makes the reward signal **tangible** β€” a visitor can play one episode in two minutes and immediately understand what makes PRobe different from a linter.
362
+
363
+ ---
364
+
365
  ## Training
366
 
367
  | | |
 
432
 
433
  ## Repo Structure
434
 
435
+ ```
436
+ .
437
+ β”œβ”€β”€ agent/
438
+ β”‚ β”œβ”€β”€ client.py # HTTP client for interacting with the environment server
439
+ β”‚ β”œβ”€β”€ models.py # Pydantic models: ProbeAction, ProbeObservation, RewardType
440
+ β”‚ └── __init__.py
441
+ β”œβ”€β”€ environment/
442
+ β”‚ β”œβ”€β”€ app.py # FastAPI server (HTTP + WebSocket + static frontend at /ui/)
443
+ β”‚ β”œβ”€β”€ Dockerfile # Container definition for HuggingFace Spaces
444
+ β”‚ β”œβ”€β”€ episode_memory.py # Cross-episode JSON memory (injects prior-finding hints)
445
+ β”‚ β”œβ”€β”€ graders.py # Deterministic reward grader (keyword+line+length verifier)
446
+ β”‚ β”œβ”€β”€ mutator.py # Code mutation engine (rename / shift / nudge)
447
+ β”‚ β”œβ”€β”€ probe_environment.py # Core environment: reset / step / state / action handlers
448
+ β”‚ β”œβ”€β”€ requirements.txt # Server-side Python dependencies
449
+ β”‚ β”œβ”€β”€ scanner.py # Simulated static-analysis tool (70% recall, FP injection)
450
+ β”‚ β”œβ”€β”€ tasks.py # 10 task definitions with ground-truth issue lists
451
+ β”‚ β”œβ”€β”€ _import_compat.py # Import shim for package / script / test contexts
452
+ β”‚ └── __init__.py
453
+ β”œβ”€β”€ frontend/
454
+ β”‚ β”œβ”€β”€ index.html # Three-column dashboard layout
455
+ β”‚ β”œβ”€β”€ style.css # Dark IDE theme (no build step required)
456
+ β”‚ └── app.js # WebSocket client, code viewer, reward ring, history feed
457
+ β”œβ”€β”€ training/
458
+ β”‚ β”œβ”€β”€ baseline.py # Zero-shot GPT-4o-mini baseline agent + plotting
459
+ β”‚ β”œβ”€β”€ scripted_baseline.py # Deterministic oracle and spammer stress-tests
460
+ β”‚ β”œβ”€β”€ train_grpo.py # GRPO training script (TRL + optional Unsloth, 5-phase curriculum)
461
+ β”‚ └── __init__.py
462
+ β”œβ”€β”€ tests/
463
+ β”‚ β”œβ”€β”€ test_dynamic_world.py # Tests for mutation engine and scanner noise model
464
+ β”‚ β”œβ”€β”€ test_grader.py # Tests for reward grader correctness
465
+ β”‚ └── __init__.py
466
+ β”œβ”€β”€ docs/
467
+ β”‚ └── design.md # Architecture notes
468
+ β”œβ”€β”€ outputs/
469
+ β”‚ └── scripted_baseline.jsonl # Sample baseline results
470
+ β”œβ”€β”€ run.py # One-command launcher: starts server + serves frontend
471
+ β”œβ”€β”€ openenv.yaml # OpenEnv manifest (10 tasks, full schema)
472
+ β”œβ”€β”€ pyproject.toml # Project metadata and dependencies
473
+ └── pytest.ini # Test configuration
474
+ ```
475
+
476
+ ---
477
+
478
+ ## OpenEnv Compliance Checklist
479
+
480
+ - [x] Built on `Environment` base class (`ProbeEnvironment(Environment)` in `environment/probe_environment.py`)
481
+ - [x] `reset()`, `step()`, `state()` all implemented (async-native via `async_reset` / `async_step` / `async_state`; sync wrappers delegate safely via `asyncio.run`)
482
+ - [x] `step()` returns `tuple[ObservationType, RewardType, bool, dict]` (see `async_step` in `probe_environment.py`)
483
+ - [x] Dedicated `RewardType` Pydantic v2 model with `model_config = ConfigDict(frozen=True)` (`agent/models.py`)
484
+ - [x] Valid `openenv.yaml` manifest (spec_version, name, type, runtime, app, port, 10 tasks, observation schema)
485
+ - [x] Client/server separation enforced (`agent/` = client models + HTTP client; `environment/` = server logic)
486
+ - [x] No reserved MCP tool names used
487
+ - [ ] Hosted on HuggingFace Spaces ([FILL: deploy and add URL to links table above])
488
+
489
+
490
+ ---
491
+
492
+ ## The Problem
493
+
494
+ The XZ Utils backdoor (CVE-2024-3094) slipped through two years of open-source review. SolarWinds compromised 18,000 organisations via a tampered build pipeline. In both cases the malicious change *looked* like a legitimate contribution β€” the kind of PR that lands in a code-review queue every day.
495
+
496
+ Today's LLMs scan code like a linter. They find style issues, flag known CVE patterns, and produce plausible-sounding comments. What they don't do is *investigate* β€” reason about intent, distinguish an honest off-by-one from a planted authentication bypass, or know when to escalate rather than request changes. Reward signals for code generation are everywhere; reward signals for critical code *evaluation* barely exist.
497
+
498
+ PRobe closes that gap. Its fully deterministic grader β€” keyword + line-range matching, no LLM judge β€” separates investigation quality from keyword spam. An agent that dumps every security term at random lines scores *negative*. One that reads carefully, probes for context, finds the right lines, and correctly labels each flaw as an honest bug or a deliberate backdoor scores close to `+1.0`.
499
+
500
+ ---
501
+
502
+ ## What the Agent Sees, Does, and Gets Rewarded For
503
+
504
+ ### Plain English
505
+
506
+ The agent is handed a Python source file and asked to review it like a senior security engineer. It can annotate suspicious lines, probe specific regions for more context, run a simulated scanner (which, like real tools, misses things and occasionally lies), and finally submit a verdict. On adversarial tasks it must also decide whether the code contains a deliberate backdoor and escalate to a security team if so. Every episode the code surface changes β€” variable names, line numbers, constants β€” so the agent cannot memorise answers; it has to read.
507
+
508
+ ### What the Agent Observes (`ProbeObservation`)
509
+
510
+ | Field | Description |
511
+ |---|---|
512
+ | `code_snippet` | Mutated Python source for this episode |
513
+ | `task_description` | Review instructions and goals |
514
+ | `file_name` | Name of the file being reviewed |
515
+ | `task_id` / `task_difficulty` | Current task index (0–9) and difficulty label |
516
+ | `review_history` | All actions taken so far this episode |
517
+ | `step_count` / `max_steps` | Steps used vs. budget |
518
+ | `issues_found_count` / `total_issues` | Progress tracker |
519
+ | `context_hints` | Causal hints unlocked by finding key issues |
520
+ | `reward` | Most recent step reward in `[-1.0, 1.0]` |
521
+ | `done` | Whether the episode has ended |
522
+
523
+ ### What Actions the Agent Can Take (`ProbeAction`)
524
+
525
+ | Action | Effect |
526
+ |---|---|
527
+ | `add_comment` | Annotate a line with text, severity, category, and optional backdoor classification |
528
+ | `get_context` | Reveal Β±5 lines of context around a chosen line number |
529
+ | `run_scanner` | Invoke simulated static-analysis tool (70 % recall, up to 2 false positives injected) |
530
+ | `request_changes` | Mark PR as requiring fixes (correct terminal action for tasks 0–6) |
531
+ | `approve` | Approve the PR (penalised if issues remain) |
532
+ | `submit_review` | Finalise the review and end the episode |
533
+ | `escalate_to_security_review` | Flag PR as containing a deliberate attack (required for tasks 7–9) |
534
+
535
+ ### Reward Formula
536
+
537
+ Reward accumulates across steps and is finalised at submission:
538
+
539
+ ```
540
+ Episode reward =
541
+
542
+ Ξ£ per-comment (ADD_COMMENT):
543
+ issue_credit = (weight_i / total_weight) Γ— 0.40 ← found a real issue
544
+ classification_credit = (weight_i / total_weight) Γ— 0.20 ← correct bug/backdoor label
545
+ misclassify_penalty = βˆ’0.05 ← found it but labelled it wrong
546
+ false_positive_penalty = βˆ’0.05 ← substantive comment, no issue matched
547
+
548
+ + on terminal (SUBMIT_REVIEW or ESCALATE):
549
+ coverage_bonus = weighted_coverage Γ— 0.15 ← proportional to issues found
550
+ decision_score = +0.15 / βˆ’0.15 ← correct / wrong final action
551
+ (bonus gated: requires coverage β‰₯ 30 %)
552
+ efficiency_bonus = (1 βˆ’ steps_used/max_steps) Γ— 0.10 ← unlocked only if coverage β‰₯ 60 %
553
+
554
+ Maximum achievable: ~1.0 Minimum: βˆ’1.0
555
+ ```
556
+
557
+ ### Anti-Exploit Verifier
558
+
559
+ A comment earns `issue_credit` only when **all three** conditions hold simultaneously:
560
+
561
+ 1. **`keyword_hit`** β€” at least one issue keyword appears in the comment text
562
+ 2. **`line_hit`** β€” `line_number` is within Β±2 lines of the declared issue range
563
+ 3. **`substantive`** β€” comment body is longer than 15 characters
564
+
565
+ This closes three common reward-hacking paths: keyword spam (fails `line_hit`), wide-net line fishing (fails `keyword_hit`), and one-word dumps (fails `substantive`). The decision bonus additionally requires weighted coverage β‰₯ 30 % before it can be earned, so an agent that never reads code and always guesses `request_changes` earns zero β€” not a bonus.
566
+
567
+ ### Perfect Episode vs. Failing Episode
568
+
569
+ **Perfect:** The agent reads the code, annotates every real issue at the correct line with a substantive, keyword-bearing comment, correctly labels each as `accidental_bug` or `intentional_backdoor`, escalates when required, and submits with steps to spare. Score approaches `1.0`.
570
+
571
+ **Failing:** The agent spams generic comments on random lines, never co-locates a keyword with a real issue line, triggers false-positive penalties on every step, and submits the wrong terminal action. Score approaches `βˆ’1.0`.
572
+
573
+ ---
574
+
575
+ ## Environment Design
576
+
577
+ ### Difficulty Tiers
578
+
579
+ | Tier | Tasks | Max Steps | Issues | What Changes |
580
+ |---|---|---|---|---|
581
+ | **Ultra-Easy** (bootstrap) | 0 | 6 | 2 | Category names spelled out in code comments above each bug; guarantees GRPO positive trajectories from step 1 |
582
+ | **Easy** | 1 | 15 | 3 | Clean logic bugs, no hints, no distractors |
583
+ | **Medium** | 2, 4 | 20 | 5 | Security and async-concurrency issues; wider surface area |
584
+ | **Hard** | 3, 5, 6 | 30–35 | 6–7 | Mixed bug classes; task 6 adds a causal unlock chain |
585
+ | **Adversarial** | 7, 8, 9 | 25–35 | 2–3 | Deliberate backdoors requiring `intentional_backdoor` classification and `escalate_to_security_review` |
586
+
587
+ ### Tasks (10 total)
588
+
589
+ | ID | Difficulty | File | Issues | Domain |
590
+ |---|---|---|---|---|
591
+ | 0 | Ultra-easy | `bootstrap.py` | 2 | Off-by-one + hardcoded credential (hinted in comments) |
592
+ | 1 | Easy | `utils.py` | 3 | Logic bugs, off-by-one, dead code |
593
+ | 2 | Medium | `auth.py` | 5 | SQL injection, MD5, eval(), hardcoded creds |
594
+ | 3 | Hard | `data_pipeline.py` | 7 | N+1, SSL bypass, thread leak, OOM cache |
595
+ | 4 | Medium | `async_worker.py` | 5 | Race condition, missing await, resource leak |
596
+ | 5 | Hard | `api_server.py` | 6 | Command injection, path traversal, pickle RCE |
597
+ | 6 | Hard | `auth_service.py` | 6 | **Causal chain** β€” JWT forgery β†’ privilege escalation |
598
+ | 7 | Adversarial | `session_manager.py` | 2 | Planted auth bypass + hardcoded bypass credential |
599
+ | 8 | Adversarial | `jwt_validator.py` | 2 | **Compound backdoor** β€” fixed-seed JWT secret + timing oracle |
600
+ | 9 | Adversarial | `compat_shim.py` | 3 | **Supply chain** β€” import-time env-var exfiltration to attacker domain |
601
+
602
+ ### GRPO Curriculum (5 Phases in `train_grpo.py`)
603
+
604
+ | Phase | Steps | Tasks in Pool |
605
+ |---|---|---|
606
+ | 0 | 0 – 40 | 0–1 (ultra-easy / easy) |
607
+ | 1 | 40 – 80 | 0–3 (adds medium / hard) |
608
+ | 2 | 80 – 120 | 0–6 (adds causal chain) |
609
+ | 3 | 120 – 160 | 0–8 (adds adversarial) |
610
+ | 4 | 160 – 200 | 0–9 (full curriculum) |
611
+
612
+ ### Reward Components with Weights
613
+
614
+ | Component | Weight | Trigger |
615
+ |---|---|---|
616
+ | `issue_credit` | up to **0.40** cumulative | `add_comment` matches a real issue (keyword + line + length) |
617
+ | `classification_credit` | up to **0.20** cumulative | correct `accidental_bug` / `intentional_backdoor` label |
618
+ | `misclassify_penalty` | **βˆ’0.05** per issue | issue found but wrong classification label |
619
+ | `false_positive_penalty` | **βˆ’0.05** per comment | substantive comment, zero issues matched |
620
+ | `coverage_bonus` | up to **0.15** terminal | `weighted_coverage Γ— 0.15` |
621
+ | `decision_score` | **Β±0.15** terminal | correct / wrong `request_changes` vs `escalate` decision |
622
+ | `efficiency_bonus` | up to **0.10** terminal | `(1 βˆ’ steps/max_steps) Γ— 0.10` when coverage β‰₯ 60 % |
623
+ | `format_bonus` | **+0.02** once | response contains a valid non-empty JSON array |
624
+
625
+ ### Dynamic World (Anti-Memorisation)
626
+
627
+ Each episode `mutate_task()` applies three seed-controlled transforms:
628
+
629
+ | Mutation | Example |
630
+ |---|---|
631
+ | Variable rename | `total` β†’ `acc`, `data` β†’ `payload`, `password` β†’ `passwd` |
632
+ | Line shift | Blank line inserted above first issue; all `line_range` values shift +1 |
633
+ | Constant variance | `range(len(data) + 1)` β†’ `range(len(data) + 2)` |
634
+
635
+ Mutations are deterministic given the episode seed β€” reproducible runs, always fresh surfaces.
636
+
637
+ ### Scanner Noise Model (`scanner.py`)
638
+
639
+ `run_scanner()` simulates a real lint/security tool:
640
+ - **Recall: 70 %** β€” each real issue is reported with probability 0.70; ~30 % silently missed
641
+ - **False-positive rate: 40 %** β€” up to 2 injected plausible-but-wrong findings per run
642
+ - Scanner output is **not auto-graded** β€” the agent must still call `add_comment` with a correct line + keyword to earn reward
643
+
644
+ ### Causal Unlock Chain (Task 6)
645
+
646
+ Finding certain issues appends new context hints to the observation, modelling real investigations where one discovery leads to a deeper one:
647
+
648
+ ```
649
+ Find hardcoded JWT secret β†’ DB schema revealed β†’ agent can reason: forge token β†’ privilege escalation
650
+ Find missing rate-limit β†’ nginx config shown β†’ confirms /auth fully exposed with no IP filtering
651
+ ```
652
+
653
+ ### OpenEnv Interface
654
+
655
+ | Method | Returns | Notes |
656
+ |---|---|---|
657
+ | `reset()` | `ProbeObservation` | Starts new episode; advances task cursor; applies mutation |
658
+ | `step(action)` | `(ProbeObservation, RewardType, bool, dict)` | Executes action; returns obs, structured reward, done flag, info dict |
659
+ | `state` (sync property) | `State(episode_id, step_count)` | Lightweight snapshot for `create_app` |
660
+ | `async_state()` | `dict` | Full async snapshot with all episode fields |
661
+
662
+ ---
663
+
664
+ ## Quickstart
665
+
666
+ ```bash
667
+ # Install
668
+ uv sync
669
+
670
+ # Run the environment server
671
+ uv run uvicorn environment.app:app --host 0.0.0.0 --port 8000 --reload
672
+
673
+ # Run zero-shot GPT-4o-mini baseline (requires OPENAI_API_KEY)
674
+ export OPENAI_API_KEY=sk-...
675
+ uv run python training/baseline.py
676
+
677
+ # Smoke-test reward function (no GPU, no API key)
678
+ uv run python training/train_grpo.py --test
679
+ ```
680
+
681
+ ---
682
+
683
+ ## Repo Structure
684
+
685
  ```
686
  .
687
  β”œβ”€β”€ agent/
docs/design.md CHANGED
@@ -17,7 +17,7 @@ repo-root/
17
 
18
  ## Environment entry point
19
 
20
- `environment/app.py` β€” FastAPI app mounted at `/web`.
21
  `openenv.yaml` β†’ `app: environment.app:app`.
22
 
23
  ## Reward function
 
17
 
18
  ## Environment entry point
19
 
20
+ `environment/app.py` β€” FastAPI app mounted at `/ui/` (static frontend) and `/docs` (API).
21
  `openenv.yaml` β†’ `app: environment.app:app`.
22
 
23
  ## Reward function
environment/app.py CHANGED
@@ -21,12 +21,15 @@ from __future__ import annotations
21
 
22
  import json
23
  import logging
 
24
  from contextlib import asynccontextmanager
25
  from typing import Any
26
 
27
  import uvicorn
28
  from fastapi import FastAPI, HTTPException, WebSocket, WebSocketDisconnect
 
29
  from fastapi.responses import HTMLResponse
 
30
 
31
  try:
32
  from openenv.core.env_server.http_server import create_app as _create_openenv_app
@@ -37,7 +40,7 @@ except Exception: # pragma: no cover
37
  try:
38
  from ..agent.models import ProbeAction, ProbeObservation, RewardType
39
  from .probe_environment import ProbeEnvironment
40
- except ModuleNotFoundError:
41
  from agent.models import ProbeAction, ProbeObservation, RewardType # type: ignore
42
  from environment.probe_environment import ProbeEnvironment # type: ignore
43
 
@@ -85,6 +88,11 @@ class StepResponse:
85
 
86
  # ── App factory ───────────────────────────────────────────────────────────────
87
 
 
 
 
 
 
88
  def _build_app() -> FastAPI:
89
  application = FastAPI(
90
  title="PRobe",
@@ -93,6 +101,15 @@ def _build_app() -> FastAPI:
93
  lifespan=lifespan,
94
  )
95
 
 
 
 
 
 
 
 
 
 
96
  # ── HTTP endpoints ────────────────────────────────────────────────────
97
 
98
  @application.post("/reset", summary="Start a new episode")
@@ -175,18 +192,25 @@ def _build_app() -> FastAPI:
175
  pass
176
 
177
  # ── Web UI ────────────────────────────────────────────────────────────
178
-
179
  @application.get("/web", response_class=HTMLResponse, include_in_schema=False)
180
- async def web_ui() -> str:
181
- return """
182
- <!doctype html><html><head><title>PRobe</title></head>
183
- <body style="font-family:sans-serif;padding:2rem">
184
- <h2>PRobe Environment</h2>
185
- <p>API docs: <a href="/docs">/docs</a></p>
186
- <p>Health: <a href="/health">/health</a></p>
187
- <p>Schema: <a href="/schema">/schema</a></p>
188
- </body></html>
189
- """
 
 
 
 
 
 
 
190
 
191
  return application
192
 
 
21
 
22
  import json
23
  import logging
24
+ import pathlib
25
  from contextlib import asynccontextmanager
26
  from typing import Any
27
 
28
  import uvicorn
29
  from fastapi import FastAPI, HTTPException, WebSocket, WebSocketDisconnect
30
+ from fastapi.middleware.cors import CORSMiddleware
31
  from fastapi.responses import HTMLResponse
32
+ from fastapi.staticfiles import StaticFiles
33
 
34
  try:
35
  from openenv.core.env_server.http_server import create_app as _create_openenv_app
 
40
  try:
41
  from ..agent.models import ProbeAction, ProbeObservation, RewardType
42
  from .probe_environment import ProbeEnvironment
43
+ except (ImportError, ModuleNotFoundError):
44
  from agent.models import ProbeAction, ProbeObservation, RewardType # type: ignore
45
  from environment.probe_environment import ProbeEnvironment # type: ignore
46
 
 
88
 
89
  # ── App factory ───────────────────────────────────────────────────────────────
90
 
91
+ # Resolve the frontend directory relative to this file so the app works
92
+ # regardless of the working directory it is launched from.
93
+ _FRONTEND_DIR = pathlib.Path(__file__).parent.parent / "frontend"
94
+
95
+
96
  def _build_app() -> FastAPI:
97
  application = FastAPI(
98
  title="PRobe",
 
101
  lifespan=lifespan,
102
  )
103
 
104
+ # Allow the frontend (served on the same host, any port) to call the API.
105
+ # In production, restrict allow_origins to the exact frontend URL.
106
+ application.add_middleware(
107
+ CORSMiddleware,
108
+ allow_origins=["*"],
109
+ allow_methods=["*"],
110
+ allow_headers=["*"],
111
+ )
112
+
113
  # ── HTTP endpoints ────────────────────────────────────────────────────
114
 
115
  @application.post("/reset", summary="Start a new episode")
 
192
  pass
193
 
194
  # ── Web UI ────────────────────────────────────────────────────────────
195
+ # /web β†’ redirect so old links still work
196
  @application.get("/web", response_class=HTMLResponse, include_in_schema=False)
197
+ async def web_redirect() -> HTMLResponse:
198
+ return HTMLResponse(
199
+ '<meta http-equiv="refresh" content="0;url=/ui/">',
200
+ status_code=200,
201
+ )
202
+
203
+ # Mount the compiled frontend as a static site at /ui.
204
+ # Falls back gracefully if the frontend directory has not been built yet.
205
+ if _FRONTEND_DIR.is_dir():
206
+ application.mount("/ui", StaticFiles(directory=str(_FRONTEND_DIR), html=True), name="ui")
207
+ log.info("Frontend mounted at /ui from %s", _FRONTEND_DIR)
208
+ else:
209
+ log.warning(
210
+ "Frontend directory not found at %s β€” /ui will not be available. "
211
+ "Run the frontend build or create the 'frontend/' directory.",
212
+ _FRONTEND_DIR,
213
+ )
214
 
215
  return application
216
 
frontend/app.js ADDED
@@ -0,0 +1,597 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /**
2
+ * PRobe Frontend β€” WebSocket client & UI controller
3
+ *
4
+ * Connects to the backend WebSocket at /ws, drives a full episode
5
+ * lifecycle: reset β†’ step* β†’ terminal, and renders all state changes
6
+ * (code viewer, reward bars, history feed, episode-end modal) in real time.
7
+ *
8
+ * Architecture
9
+ * ------------
10
+ * WsClient β€” thin wrapper around native WebSocket with reconnect
11
+ * RewardDashboard β€” renders ring, component bars, issues progress
12
+ * CodeViewer β€” renders syntax-highlighted code with line decorations
13
+ * HistoryFeed β€” append-only action history list
14
+ * ProbeController β€” orchestrates all of the above; owns episode state
15
+ */
16
+
17
+ "use strict";
18
+
19
+ // ═══════════════════════════════════════════════════════════════════
20
+ // CONFIG
21
+ // ═══════════════════════════════════════════════════════════════════
22
+
23
+ const CONFIG = {
24
+ // WebSocket URL β€” auto-detects host so the page works on any deployment
25
+ wsUrl: `ws://${window.location.hostname}:8000/ws`,
26
+ reconnectDelayMs: 2000,
27
+ ringCircumference: 314, // 2Ο€ Γ— r=50
28
+ };
29
+
30
+
31
+ // ═══════════════════════════════════════════════════════════════════
32
+ // WsClient β€” WebSocket with auto-reconnect
33
+ // ═══════════════════════════════════════════════════════════════════
34
+
35
+ class WsClient {
36
+ /**
37
+ * @param {string} url WebSocket endpoint
38
+ * @param {function} onMessage Called with parsed JSON message objects
39
+ * @param {function} onStatusChange Called with ('connected'|'disconnected')
40
+ */
41
+ constructor(url, onMessage, onStatusChange) {
42
+ this._url = url;
43
+ this._onMessage = onMessage;
44
+ this._onStatusChange = onStatusChange;
45
+ this._socket = null;
46
+ this._connected = false;
47
+ }
48
+
49
+ connect() {
50
+ if (this._socket) this._socket.close();
51
+
52
+ this._socket = new WebSocket(this._url);
53
+
54
+ this._socket.onopen = () => {
55
+ this._connected = true;
56
+ this._onStatusChange("connected");
57
+ };
58
+
59
+ this._socket.onclose = () => {
60
+ this._connected = false;
61
+ this._onStatusChange("disconnected");
62
+ };
63
+
64
+ this._socket.onerror = (err) => {
65
+ console.error("[WsClient] error:", err);
66
+ this._connected = false;
67
+ this._onStatusChange("disconnected");
68
+ };
69
+
70
+ this._socket.onmessage = (event) => {
71
+ try {
72
+ const msg = JSON.parse(event.data);
73
+ this._onMessage(msg);
74
+ } catch (e) {
75
+ console.warn("[WsClient] unparseable message:", event.data);
76
+ }
77
+ };
78
+ }
79
+
80
+ send(payload) {
81
+ if (!this._connected) {
82
+ console.warn("[WsClient] send called while disconnected");
83
+ return;
84
+ }
85
+ this._socket.send(JSON.stringify(payload));
86
+ }
87
+
88
+ get isConnected() { return this._connected; }
89
+ }
90
+
91
+
92
+ // ═══════════════════════════════════════════════════════════════════
93
+ // CodeViewer β€” renders code with per-line decorations
94
+ // ═══════════════════════════════════════════════════════════════════
95
+
96
+ class CodeViewer {
97
+ constructor(preEl) {
98
+ this._pre = preEl;
99
+ this._lines = [];
100
+ // Track which lines have active highlights so we can clear them
101
+ this._decoratedLines = new Set();
102
+ }
103
+
104
+ /**
105
+ * Render source code as numbered, individually addressable lines.
106
+ * Clears any previous decorations.
107
+ */
108
+ render(sourceCode) {
109
+ this._lines = sourceCode.split("\n");
110
+ this._decoratedLines.clear();
111
+ this._pre.innerHTML = this._lines.map((text, idx) => {
112
+ const lineNum = idx + 1;
113
+ return `<span class="code-line" id="cl-${lineNum}">`
114
+ + `<span class="code-line-num">${lineNum}</span>`
115
+ + escapeHtml(text)
116
+ + `</span>`;
117
+ }).join("\n");
118
+ }
119
+
120
+ /**
121
+ * Apply a CSS class to a specific line.
122
+ * @param {number} lineNumber 1-based
123
+ * @param {string} cssClass e.g. 'hl-comment'
124
+ */
125
+ decorateLine(lineNumber, cssClass) {
126
+ const el = document.getElementById(`cl-${lineNumber}`);
127
+ if (!el) return;
128
+ // Remove any previous highlight class on this line before adding the new one
129
+ el.classList.remove("hl-comment", "hl-issue", "hl-scanner", "hl-context");
130
+ el.classList.add(cssClass);
131
+ this._decoratedLines.add(lineNumber);
132
+ }
133
+
134
+ /** Scroll the given 1-based line number into view. */
135
+ scrollToLine(lineNumber) {
136
+ const el = document.getElementById(`cl-${lineNumber}`);
137
+ if (el) el.scrollIntoView({ block: "center", behavior: "smooth" });
138
+ }
139
+
140
+ clearDecorations() {
141
+ for (const lineNum of this._decoratedLines) {
142
+ const el = document.getElementById(`cl-${lineNum}`);
143
+ if (el) el.classList.remove("hl-comment", "hl-issue", "hl-scanner", "hl-context");
144
+ }
145
+ this._decoratedLines.clear();
146
+ }
147
+ }
148
+
149
+
150
+ // ═══════════════════════════════════════════════════════════════════
151
+ // RewardDashboard β€” ring + bars + issues progress
152
+ // ═══════════════════════════════════════════════════════════════════
153
+
154
+ class RewardDashboard {
155
+ constructor() {
156
+ this._ringTrack = document.getElementById("ring-track");
157
+ this._ringValue = document.getElementById("ring-value");
158
+ this._issuesFill = document.getElementById("issues-bar-fill");
159
+ this._issuesLabel = document.getElementById("issues-found-label");
160
+
161
+ // Component bar element pairs { fill, val }
162
+ this._bars = {
163
+ issue_credit: this._barPair("issue_credit"),
164
+ classification_credit: this._barPair("classification_credit"),
165
+ false_positive_penalty:this._barPair("false_positive_penalty"),
166
+ coverage_bonus: this._barPair("coverage_bonus"),
167
+ decision_score: this._barPair("decision_score"),
168
+ efficiency_bonus: this._barPair("efficiency_bonus"),
169
+ };
170
+ }
171
+
172
+ _barPair(key) {
173
+ return {
174
+ fill: document.getElementById(`bar-${key}`),
175
+ val: document.getElementById(`val-${key}`),
176
+ };
177
+ }
178
+
179
+ /**
180
+ * Update the cumulative reward ring.
181
+ * Clamps input to [-1, 1] and maps to ring arc.
182
+ */
183
+ updateRing(cumulativeReward) {
184
+ const clamped = Math.max(-1, Math.min(1, cumulativeReward));
185
+ // Map [-1, 1] β†’ [0, circumference]: negative reward still shows a partial arc
186
+ const fraction = (clamped + 1) / 2;
187
+ const offset = CONFIG.ringCircumference * (1 - fraction);
188
+
189
+ this._ringTrack.style.strokeDashoffset = offset;
190
+ // Colour: green above 0, red below
191
+ this._ringTrack.style.stroke = clamped >= 0 ? "var(--green)" : "var(--red)";
192
+ this._ringValue.textContent = clamped.toFixed(2);
193
+ this._ringValue.style.color = clamped >= 0 ? "var(--green)" : "var(--red)";
194
+ }
195
+
196
+ /**
197
+ * Render per-component score bars from a components dict.
198
+ * The bar width maps the absolute value to a 0-100% scale capped at 0.40.
199
+ */
200
+ updateBars(components) {
201
+ const MAX_BAR_VALUE = 0.40;
202
+
203
+ for (const [key, pair] of Object.entries(this._bars)) {
204
+ const rawValue = components[key] ?? 0;
205
+ const absWidth = Math.min(Math.abs(rawValue) / MAX_BAR_VALUE * 100, 100);
206
+
207
+ pair.fill.style.width = `${absWidth}%`;
208
+ pair.val.textContent = rawValue.toFixed(2);
209
+
210
+ // Positive/negative/neutral colouring
211
+ pair.fill.classList.remove("positive", "negative", "neutral");
212
+ if (rawValue > 0) pair.fill.classList.add("positive");
213
+ else if (rawValue < 0) pair.fill.classList.add("negative");
214
+ else pair.fill.classList.add("neutral");
215
+ }
216
+ }
217
+
218
+ /** Update the issues-found progress bar. */
219
+ updateIssues(found, total) {
220
+ const pct = total > 0 ? (found / total) * 100 : 0;
221
+ this._issuesFill.style.width = `${pct}%`;
222
+ this._issuesLabel.textContent = `${found} / ${total}`;
223
+ }
224
+
225
+ reset() {
226
+ this.updateRing(0);
227
+ this.updateBars({});
228
+ this.updateIssues(0, 0);
229
+ }
230
+ }
231
+
232
+
233
+ // ═══════════════════════════════════════════════════════════════════
234
+ // HistoryFeed β€” append-only episode action log
235
+ // ═══════════════════════════════════════════════════════════════════
236
+
237
+ class HistoryFeed {
238
+ constructor(containerEl) {
239
+ this._container = containerEl;
240
+ this._count = 0;
241
+ }
242
+
243
+ clear() {
244
+ this._container.innerHTML = '<div class="history-empty">No actions yet.</div>';
245
+ this._count = 0;
246
+ }
247
+
248
+ /**
249
+ * Append one step to the feed.
250
+ * @param {string} actionType Human-readable action label
251
+ * @param {object} reward RewardType object from server
252
+ */
253
+ append(actionType, reward) {
254
+ if (this._count === 0) {
255
+ this._container.innerHTML = "";
256
+ }
257
+ this._count++;
258
+
259
+ const total = reward.total ?? 0;
260
+ const polarity = total > 0.001 ? "positive" : total < -0.001 ? "negative" : "neutral";
261
+ const rewardClass = total >= 0 ? "pos" : "neg";
262
+ const sign = total >= 0 ? "+" : "";
263
+
264
+ const item = document.createElement("div");
265
+ item.className = `history-item ${polarity}`;
266
+ item.innerHTML = `
267
+ <div>
268
+ <span class="h-action">${escapeHtml(actionType)}</span>
269
+ &nbsp;β†’&nbsp;
270
+ <span class="h-reward ${rewardClass}">${sign}${total.toFixed(3)}</span>
271
+ </div>
272
+ <div class="h-explain">${escapeHtml(reward.explanation ?? "")}</div>
273
+ `;
274
+ this._container.prepend(item); // newest at top
275
+ }
276
+ }
277
+
278
+
279
+ // ═══════════════════════════════════════════════════════════════════
280
+ // ProbeController β€” owns all state, wires UI ↔ WsClient
281
+ // ═══════════════════════════════════════════════════════════════════
282
+
283
+ class ProbeController {
284
+ constructor() {
285
+ // Sub-components
286
+ this._ws = null;
287
+ this._viewer = new CodeViewer(document.getElementById("code-block"));
288
+ this._dashboard = new RewardDashboard();
289
+ this._feed = new HistoryFeed(document.getElementById("history-feed"));
290
+
291
+ // Episode state
292
+ this._episodeActive = false;
293
+ this._cumulativeReward = 0;
294
+ this._stepCount = 0;
295
+ this._maxSteps = 0;
296
+ this._totalIssues = 0;
297
+ this._foundCount = 0;
298
+ this._lastObs = null;
299
+
300
+ this._bindStaticButtons();
301
+ }
302
+
303
+ // ── Initialisation ──────────────────────────────────────────────
304
+
305
+ _bindStaticButtons() {
306
+ document.getElementById("btn-connect").addEventListener("click", () => this._connect());
307
+ document.getElementById("btn-reset").addEventListener("click", () => this._sendReset());
308
+ document.getElementById("btn-comment").addEventListener("click", () => this._sendComment());
309
+ document.getElementById("btn-get-context").addEventListener("click", () => this._sendGetContext());
310
+ document.getElementById("btn-run-scanner").addEventListener("click", () => this._sendAction("run_scanner"));
311
+ document.getElementById("btn-request-changes").addEventListener("click", () => this._sendAction("request_changes"));
312
+ document.getElementById("btn-approve").addEventListener("click", () => this._sendAction("approve"));
313
+ document.getElementById("btn-submit").addEventListener("click", () => this._sendAction("submit_review"));
314
+ document.getElementById("btn-escalate").addEventListener("click",() => this._sendAction("escalate_to_security_review"));
315
+ document.getElementById("modal-close").addEventListener("click", () => {
316
+ document.getElementById("modal-overlay").style.display = "none";
317
+ this._sendReset();
318
+ });
319
+ }
320
+
321
+ // ── WebSocket lifecycle ──────────────────────────────────────────
322
+
323
+ _connect() {
324
+ this._ws = new WsClient(
325
+ CONFIG.wsUrl,
326
+ (msg) => this._handleMessage(msg),
327
+ (status) => this._handleConnectionStatus(status),
328
+ );
329
+ this._ws.connect();
330
+ }
331
+
332
+ _handleConnectionStatus(status) {
333
+ const badge = document.getElementById("conn-badge");
334
+ const btnReset = document.getElementById("btn-reset");
335
+ const btnConnect = document.getElementById("btn-connect");
336
+
337
+ if (status === "connected") {
338
+ badge.textContent = "🟒 Connected";
339
+ badge.className = "badge connected";
340
+ btnConnect.textContent = "Reconnect";
341
+ btnReset.disabled = false;
342
+ // Auto-start first episode on successful connect
343
+ this._sendReset();
344
+ } else {
345
+ badge.textContent = "⚫ Disconnected";
346
+ badge.className = "badge disconnected";
347
+ this._setActionButtonsEnabled(false);
348
+ }
349
+ }
350
+
351
+ // ── Message dispatch ─────────────────────────────────────────────
352
+
353
+ _handleMessage(msg) {
354
+ switch (msg.type) {
355
+ case "reset": this._applyObservation(msg.observation, null, false); break;
356
+ case "step": this._applyStep(msg); break;
357
+ case "error": this._showError(msg.detail); break;
358
+ default: console.warn("[ProbeController] unknown message type:", msg.type);
359
+ }
360
+ }
361
+
362
+ // ── Episode state application ────────────────────────────────────
363
+
364
+ /**
365
+ * Apply a fresh observation (after reset or step).
366
+ * Updates every UI component from the single observation object.
367
+ */
368
+ _applyObservation(obs, reward, done) {
369
+ this._lastObs = obs;
370
+ this._stepCount = obs.step_count;
371
+ this._maxSteps = obs.max_steps;
372
+ this._totalIssues = obs.total_issues;
373
+ this._foundCount = obs.issues_found_count;
374
+
375
+ // ── Task metadata ──
376
+ document.getElementById("task-label").textContent =
377
+ `Task ${obs.task_id} β€” ${obs.file_name}`;
378
+ document.getElementById("task-desc").textContent = obs.task_description;
379
+ document.getElementById("steps-counter").textContent =
380
+ `Step ${obs.step_count} / ${obs.max_steps}`;
381
+
382
+ const diffBadge = document.getElementById("difficulty-badge");
383
+ diffBadge.textContent = obs.task_difficulty;
384
+ diffBadge.className = `difficulty-badge ${obs.task_difficulty.replace(/\s+/g, "-")}`;
385
+
386
+ // ── Adversarial hint ──
387
+ const advEl = document.getElementById("adv-hint");
388
+ if (obs.adversarial_hint) {
389
+ advEl.textContent = `⚠️ ${obs.adversarial_hint}`;
390
+ advEl.style.display = "block";
391
+ } else {
392
+ advEl.style.display = "none";
393
+ }
394
+
395
+ // ── Code viewer ── (only re-render if code changed, i.e. on reset)
396
+ if (!reward) {
397
+ this._viewer.render(obs.code_snippet);
398
+ this._viewer.clearDecorations();
399
+ }
400
+
401
+ // ── Highlight lines mentioned in review history ──
402
+ this._decorateHistoryLines(obs.review_history);
403
+
404
+ // ── Context hints ──
405
+ this._renderHints(obs.context_hints);
406
+
407
+ // ── Dashboard ──
408
+ this._cumulativeReward = obs.metadata?.cumulative_reward ?? 0;
409
+ this._dashboard.updateRing(this._cumulativeReward);
410
+ this._dashboard.updateIssues(this._foundCount, this._totalIssues);
411
+
412
+ if (reward) {
413
+ this._dashboard.updateBars(reward.components ?? {});
414
+ this._feed.append(this._lastActionLabel, reward);
415
+ }
416
+
417
+ // ── Terminal handling ──
418
+ if (done) {
419
+ this._episodeActive = false;
420
+ this._setActionButtonsEnabled(false);
421
+ this._showEpisodeEndModal(obs, reward);
422
+ } else {
423
+ this._episodeActive = true;
424
+ this._setActionButtonsEnabled(true);
425
+ }
426
+ }
427
+
428
+ _applyStep(msg) {
429
+ this._applyObservation(msg.observation, msg.reward, msg.done);
430
+ }
431
+
432
+ // ── Line decorations ─────────────────────────────────────────────
433
+
434
+ /**
435
+ * Walk review_history and apply colour-coded line highlights.
436
+ * Later entries overwrite earlier ones on the same line, so the most
437
+ * recent action's highlight takes priority.
438
+ */
439
+ _decorateHistoryLines(history) {
440
+ this._viewer.clearDecorations();
441
+ for (const entry of history) {
442
+ if (!entry.line) continue;
443
+ let cssClass = "hl-comment";
444
+ if (entry.type === "scanner_result") continue; // no single line
445
+ if (entry.type === "context_probe") cssClass = "hl-context";
446
+ if (entry.type === "comment") cssClass = "hl-comment";
447
+ this._viewer.decorateLine(entry.line, cssClass);
448
+ }
449
+ }
450
+
451
+ // ── Hints ────────────────────────────────────────────────────────
452
+
453
+ _renderHints(hints) {
454
+ const container = document.getElementById("hints-container");
455
+ const list = document.getElementById("hints-list");
456
+
457
+ if (!hints || hints.length === 0) {
458
+ container.style.display = "none";
459
+ return;
460
+ }
461
+ container.style.display = "block";
462
+ list.innerHTML = hints.map(h =>
463
+ `<div class="hint-item">${escapeHtml(h)}</div>`
464
+ ).join("");
465
+ }
466
+
467
+ // ── Action senders ───────────────────────────────────────────────
468
+
469
+ _sendReset() {
470
+ if (!this._ws?.isConnected) return;
471
+ this._episodeActive = false;
472
+ this._setActionButtonsEnabled(false);
473
+ this._dashboard.reset();
474
+ this._feed.clear();
475
+ this._viewer._pre.innerHTML = '<span class="placeholder-text">Loading…</span>';
476
+ document.getElementById("hints-container").style.display = "none";
477
+ document.getElementById("adv-hint").style.display = "none";
478
+ this._ws.send({ command: "reset" });
479
+ }
480
+
481
+ _sendComment() {
482
+ const line = parseInt(document.getElementById("inp-line").value, 10) || null;
483
+ const comment = document.getElementById("inp-comment").value.trim();
484
+ const severity = document.getElementById("inp-severity").value || null;
485
+ const category = document.getElementById("inp-category").value || null;
486
+ const classification = document.getElementById("inp-classification").value || null;
487
+
488
+ if (!comment) {
489
+ alert("Please enter a comment before submitting.");
490
+ return;
491
+ }
492
+ this._lastActionLabel = `ADD_COMMENT (L${line ?? "?"})`;
493
+ this._sendAction("add_comment", {
494
+ line_number: line,
495
+ comment,
496
+ severity,
497
+ category,
498
+ classification,
499
+ });
500
+ // Clear comment fields after send
501
+ document.getElementById("inp-comment").value = "";
502
+ }
503
+
504
+ _sendGetContext() {
505
+ const line = parseInt(document.getElementById("inp-probe-line").value, 10) || null;
506
+ if (!line) { alert("Enter a line number to probe."); return; }
507
+ this._lastActionLabel = `GET_CONTEXT (L${line})`;
508
+ this._sendAction("get_context", { line_number: line });
509
+ }
510
+
511
+ /**
512
+ * Send a step action to the server.
513
+ * @param {string} actionType snake_case action type string
514
+ * @param {object} extra Additional fields (line_number, comment, …)
515
+ */
516
+ _sendAction(actionType, extra = {}) {
517
+ if (!this._ws?.isConnected || !this._episodeActive) return;
518
+ this._lastActionLabel = actionType.toUpperCase().replace(/_/g, " ");
519
+ this._ws.send({
520
+ command: "step",
521
+ action: { action_type: actionType, ...extra },
522
+ });
523
+ }
524
+
525
+ // ── UI helpers ───────────────────────────────────────────────────
526
+
527
+ _setActionButtonsEnabled(enabled) {
528
+ const ids = [
529
+ "btn-comment", "btn-get-context", "btn-run-scanner",
530
+ "btn-request-changes", "btn-approve", "btn-submit", "btn-escalate",
531
+ ];
532
+ for (const id of ids) {
533
+ document.getElementById(id).disabled = !enabled;
534
+ }
535
+ }
536
+
537
+ _showEpisodeEndModal(obs, reward) {
538
+ const totalReward = this._cumulativeReward;
539
+ const passed = reward?.passed ?? false;
540
+
541
+ document.getElementById("modal-overlay").style.display = "flex";
542
+ document.getElementById("modal-icon").textContent =
543
+ totalReward >= 0.5 ? "πŸ†" : totalReward >= 0 ? "🏁" : "πŸ’”";
544
+ document.getElementById("modal-title").textContent =
545
+ passed ? "Episode Passed!" : "Episode Complete";
546
+ document.getElementById("modal-body").textContent =
547
+ reward?.explanation ?? "Episode ended.";
548
+
549
+ // Render a small stats grid inside the modal
550
+ const decision = obs.metadata?.review_decision ?? "β€”";
551
+ const esc = obs.metadata?.escalation_required ? "Yes" : "No";
552
+ document.getElementById("modal-stats").innerHTML = `
553
+ <span class="stat-label">Cumulative reward</span>
554
+ <span class="stat-value">${totalReward.toFixed(3)}</span>
555
+ <span class="stat-label">Issues found</span>
556
+ <span class="stat-value">${obs.issues_found_count} / ${obs.total_issues}</span>
557
+ <span class="stat-label">Steps used</span>
558
+ <span class="stat-value">${obs.step_count} / ${obs.max_steps}</span>
559
+ <span class="stat-label">Decision</span>
560
+ <span class="stat-value">${decision}</span>
561
+ <span class="stat-label">Escalation required</span>
562
+ <span class="stat-value">${esc}</span>
563
+ `;
564
+ }
565
+
566
+ _showError(detail) {
567
+ console.error("[ProbeController] server error:", detail);
568
+ // Non-intrusive: just log and append to feed as a red entry
569
+ this._feed.append("ERROR", {
570
+ total: 0,
571
+ explanation: detail ?? "Unknown server error",
572
+ });
573
+ }
574
+ }
575
+
576
+
577
+ // ═══════════════════════════════════════════════════════════════════
578
+ // Utilities
579
+ // ═══════════════════════════════════════════════════════════════════
580
+
581
+ /** Escape HTML special chars to prevent XSS when inserting code/text. */
582
+ function escapeHtml(str) {
583
+ return String(str)
584
+ .replace(/&/g, "&amp;")
585
+ .replace(/</g, "&lt;")
586
+ .replace(/>/g, "&gt;")
587
+ .replace(/"/g, "&quot;");
588
+ }
589
+
590
+
591
+ // ═══════════════════════════════════════════════════════════════════
592
+ // Bootstrap
593
+ // ═══════════════════════════════════════════════════════════════════
594
+
595
+ document.addEventListener("DOMContentLoaded", () => {
596
+ window._probe = new ProbeController();
597
+ });
frontend/index.html ADDED
@@ -0,0 +1,212 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!doctype html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8" />
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0" />
6
+ <title>PRobe β€” AI Code Review Training Environment</title>
7
+ <link rel="stylesheet" href="style.css" />
8
+ </head>
9
+ <body>
10
+
11
+ <!-- ══════════════════════════════════════════════════════════
12
+ TOP BAR
13
+ ══════════════════════════════════════════════════════════ -->
14
+ <header class="topbar">
15
+ <div class="topbar-left">
16
+ <span class="logo">&#x1F50D; PRobe</span>
17
+ <span class="tagline">Adversarial Code Review β€” RL Training Environment</span>
18
+ </div>
19
+ <div class="topbar-right">
20
+ <span class="badge" id="conn-badge">⚫ Disconnected</span>
21
+ <button id="btn-connect" class="btn btn-primary">Connect</button>
22
+ <button id="btn-reset" class="btn btn-secondary" disabled>New Episode</button>
23
+ </div>
24
+ </header>
25
+
26
+ <!-- ══════════════════════════════════════════════════════════
27
+ MAIN LAYOUT β€” three columns
28
+ ══════════════════════════════════════════════════════════ -->
29
+ <main class="layout">
30
+
31
+ <!-- ── LEFT: Task meta + code viewer ─────────────────────── -->
32
+ <section class="panel panel-code">
33
+
34
+ <div class="panel-header">
35
+ <span id="task-label">Task β€”</span>
36
+ <span class="difficulty-badge" id="difficulty-badge">β€”</span>
37
+ <span class="steps-counter" id="steps-counter">Step 0 / β€”</span>
38
+ </div>
39
+
40
+ <p class="task-desc" id="task-desc">Connect and start an episode to begin.</p>
41
+
42
+ <div class="adversarial-hint" id="adv-hint" style="display:none"></div>
43
+
44
+ <!-- Code block with line-number highlights -->
45
+ <div class="code-wrapper">
46
+ <pre id="code-block" class="code-block"><span class="placeholder-text">No code loaded.</span></pre>
47
+ </div>
48
+
49
+ <!-- Context hints revealed by finding key issues -->
50
+ <div id="hints-container" style="display:none">
51
+ <div class="section-title">πŸ”“ Unlocked Context Hints</div>
52
+ <div id="hints-list" class="hints-list"></div>
53
+ </div>
54
+
55
+ </section>
56
+
57
+ <!-- ── CENTRE: Action panel ───────────────────────────────── -->
58
+ <section class="panel panel-action">
59
+
60
+ <div class="panel-header">Actions</div>
61
+
62
+ <!-- ADD_COMMENT form -->
63
+ <div class="action-card" id="card-comment">
64
+ <div class="action-title">πŸ’¬ Add Comment</div>
65
+ <div class="form-row">
66
+ <label>Line</label>
67
+ <input type="number" id="inp-line" min="1" placeholder="e.g. 12" />
68
+ </div>
69
+ <div class="form-row">
70
+ <label>Comment</label>
71
+ <textarea id="inp-comment" rows="3" placeholder="Describe the issue in detail…"></textarea>
72
+ </div>
73
+ <div class="form-row">
74
+ <label>Severity</label>
75
+ <select id="inp-severity">
76
+ <option value="">β€” none β€”</option>
77
+ <option value="info">info</option>
78
+ <option value="warning">warning</option>
79
+ <option value="error">error</option>
80
+ <option value="critical">critical</option>
81
+ </select>
82
+ </div>
83
+ <div class="form-row">
84
+ <label>Category</label>
85
+ <select id="inp-category">
86
+ <option value="">β€” none β€”</option>
87
+ <option value="bug">bug</option>
88
+ <option value="security">security</option>
89
+ <option value="performance">performance</option>
90
+ <option value="style">style</option>
91
+ <option value="design">design</option>
92
+ </select>
93
+ </div>
94
+ <div class="form-row">
95
+ <label>Classification</label>
96
+ <select id="inp-classification">
97
+ <option value="">β€” none β€”</option>
98
+ <option value="accidental_bug">accidental_bug</option>
99
+ <option value="intentional_backdoor">intentional_backdoor</option>
100
+ </select>
101
+ </div>
102
+ <button class="btn btn-action" id="btn-comment" disabled>Submit Comment</button>
103
+ </div>
104
+
105
+ <!-- Quick actions -->
106
+ <div class="quick-actions">
107
+ <div class="action-title">⚑ Quick Actions</div>
108
+
109
+ <div class="form-row">
110
+ <label>Probe Line</label>
111
+ <input type="number" id="inp-probe-line" min="1" placeholder="e.g. 8" />
112
+ </div>
113
+ <button class="btn btn-action btn-info" id="btn-get-context" disabled>πŸ” Get Context</button>
114
+ <button class="btn btn-action btn-info" id="btn-run-scanner" disabled>πŸ€– Run Scanner</button>
115
+
116
+ <div class="separator"></div>
117
+
118
+ <button class="btn btn-action btn-warn" id="btn-request-changes" disabled>πŸ”„ Request Changes</button>
119
+ <button class="btn btn-action btn-success" id="btn-approve" disabled>βœ… Approve PR</button>
120
+ <button class="btn btn-action btn-danger" id="btn-submit" disabled>πŸ“€ Submit Review</button>
121
+ <button class="btn btn-action btn-escalate" id="btn-escalate" disabled>🚨 Escalate to Security</button>
122
+ </div>
123
+
124
+ </section>
125
+
126
+ <!-- ── RIGHT: Reward dashboard + history ─────────────────── -->
127
+ <section class="panel panel-reward">
128
+
129
+ <div class="panel-header">Reward Dashboard</div>
130
+
131
+ <!-- Cumulative reward ring -->
132
+ <div class="reward-ring-wrap">
133
+ <svg class="reward-ring" viewBox="0 0 120 120">
134
+ <circle class="ring-bg" cx="60" cy="60" r="50" />
135
+ <circle class="ring-track" cx="60" cy="60" r="50" id="ring-track" />
136
+ </svg>
137
+ <div class="ring-label">
138
+ <span id="ring-value">0.00</span>
139
+ <small>cumulative</small>
140
+ </div>
141
+ </div>
142
+
143
+ <!-- Per-step component bars -->
144
+ <div class="component-bars" id="component-bars">
145
+ <div class="section-title">Last Step Breakdown</div>
146
+ <div class="bar-row" id="bar-row-issue_credit">
147
+ <span class="bar-label">Issue credit</span>
148
+ <div class="bar-track"><div class="bar-fill positive" id="bar-issue_credit"></div></div>
149
+ <span class="bar-val" id="val-issue_credit">0.00</span>
150
+ </div>
151
+ <div class="bar-row" id="bar-row-classification_credit">
152
+ <span class="bar-label">Classification</span>
153
+ <div class="bar-track"><div class="bar-fill positive" id="bar-classification_credit"></div></div>
154
+ <span class="bar-val" id="val-classification_credit">0.00</span>
155
+ </div>
156
+ <div class="bar-row" id="bar-row-false_positive_penalty">
157
+ <span class="bar-label">FP penalty</span>
158
+ <div class="bar-track"><div class="bar-fill negative" id="bar-false_positive_penalty"></div></div>
159
+ <span class="bar-val" id="val-false_positive_penalty">0.00</span>
160
+ </div>
161
+ <div class="bar-row" id="bar-row-coverage_bonus">
162
+ <span class="bar-label">Coverage</span>
163
+ <div class="bar-track"><div class="bar-fill positive" id="bar-coverage_bonus"></div></div>
164
+ <span class="bar-val" id="val-coverage_bonus">0.00</span>
165
+ </div>
166
+ <div class="bar-row" id="bar-row-decision_score">
167
+ <span class="bar-label">Decision</span>
168
+ <div class="bar-track"><div class="bar-fill neutral" id="bar-decision_score"></div></div>
169
+ <span class="bar-val" id="val-decision_score">0.00</span>
170
+ </div>
171
+ <div class="bar-row" id="bar-row-efficiency_bonus">
172
+ <span class="bar-label">Efficiency</span>
173
+ <div class="bar-track"><div class="bar-fill positive" id="bar-efficiency_bonus"></div></div>
174
+ <span class="bar-val" id="val-efficiency_bonus">0.00</span>
175
+ </div>
176
+ </div>
177
+
178
+ <!-- Issues progress -->
179
+ <div class="section-title" style="margin-top:1rem">Issues Found</div>
180
+ <div class="issues-progress">
181
+ <div class="issues-bar-wrap">
182
+ <div class="issues-bar-fill" id="issues-bar-fill"></div>
183
+ </div>
184
+ <span id="issues-found-label">0 / 0</span>
185
+ </div>
186
+
187
+ <!-- Step-by-step history feed -->
188
+ <div class="section-title" style="margin-top:1rem">Episode History</div>
189
+ <div class="history-feed" id="history-feed">
190
+ <div class="history-empty">No actions yet.</div>
191
+ </div>
192
+
193
+ </section>
194
+
195
+ </main>
196
+
197
+ <!-- ══════════════════════════════════════════════════════════
198
+ EPISODE-END MODAL
199
+ ══════════════════════════════════════════════════════════ -->
200
+ <div id="modal-overlay" class="modal-overlay" style="display:none">
201
+ <div class="modal">
202
+ <div class="modal-icon" id="modal-icon">🏁</div>
203
+ <h2 id="modal-title">Episode Complete</h2>
204
+ <p id="modal-body">β€”</p>
205
+ <div class="modal-stats" id="modal-stats"></div>
206
+ <button class="btn btn-primary" id="modal-close">Start New Episode</button>
207
+ </div>
208
+ </div>
209
+
210
+ <script src="app.js"></script>
211
+ </body>
212
+ </html>
frontend/style.css ADDED
@@ -0,0 +1,391 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /* ═══════════════════════════════════════════════════════════════
2
+ PRobe Dashboard β€” stylesheet
3
+ Design tokens: dark IDE theme, accent #4f9eff
4
+ ═══════════════════════════════════════════════════════════════ */
5
+
6
+ /* ── Reset & base ─────────────────────────────────────────── */
7
+ *, *::before, *::after { box-sizing: border-box; margin: 0; padding: 0; }
8
+
9
+ :root {
10
+ --bg-0: #0d1117; /* deepest background */
11
+ --bg-1: #161b22; /* panel background */
12
+ --bg-2: #21262d; /* card / input background */
13
+ --bg-3: #30363d; /* hover / border */
14
+ --text-main: #e6edf3;
15
+ --text-dim: #8b949e;
16
+ --accent: #4f9eff;
17
+ --green: #3fb950;
18
+ --red: #f85149;
19
+ --yellow: #d29922;
20
+ --orange: #db6d28;
21
+ --purple: #a371f7;
22
+ --radius: 8px;
23
+ --font-mono: 'JetBrains Mono', 'Fira Code', 'Consolas', monospace;
24
+ --font-ui: 'Inter', system-ui, sans-serif;
25
+ --topbar-h: 52px;
26
+ }
27
+
28
+ html, body {
29
+ height: 100%;
30
+ background: var(--bg-0);
31
+ color: var(--text-main);
32
+ font-family: var(--font-ui);
33
+ font-size: 14px;
34
+ line-height: 1.5;
35
+ }
36
+
37
+ /* ── Top bar ──────────────────────────────────────────────── */
38
+ .topbar {
39
+ position: fixed;
40
+ top: 0; left: 0; right: 0;
41
+ height: var(--topbar-h);
42
+ background: var(--bg-1);
43
+ border-bottom: 1px solid var(--bg-3);
44
+ display: flex;
45
+ align-items: center;
46
+ justify-content: space-between;
47
+ padding: 0 1.25rem;
48
+ z-index: 100;
49
+ }
50
+ .topbar-left { display: flex; align-items: center; gap: 1rem; }
51
+ .logo { font-size: 1.15rem; font-weight: 700; color: var(--accent); }
52
+ .tagline { color: var(--text-dim); font-size: 0.8rem; }
53
+ .topbar-right { display: flex; align-items: center; gap: 0.75rem; }
54
+
55
+ .badge {
56
+ font-size: 0.78rem;
57
+ padding: 3px 10px;
58
+ border-radius: 12px;
59
+ background: var(--bg-2);
60
+ border: 1px solid var(--bg-3);
61
+ white-space: nowrap;
62
+ }
63
+ .badge.connected { color: var(--green); border-color: var(--green); }
64
+ .badge.disconnected { color: var(--text-dim); }
65
+
66
+ /* ── Buttons ──────────────────────────────────────────────── */
67
+ .btn {
68
+ padding: 6px 16px;
69
+ border-radius: var(--radius);
70
+ border: 1px solid transparent;
71
+ font-size: 0.82rem;
72
+ font-weight: 600;
73
+ cursor: pointer;
74
+ transition: opacity 0.15s, background 0.15s;
75
+ }
76
+ .btn:disabled { opacity: 0.35; cursor: not-allowed; }
77
+ .btn-primary { background: var(--accent); color: #fff; border-color: var(--accent); }
78
+ .btn-secondary{ background: var(--bg-2); color: var(--text-main); border-color: var(--bg-3); }
79
+ .btn-action { width: 100%; margin-bottom: 0.4rem; background: var(--bg-2); color: var(--text-main); border-color: var(--bg-3); }
80
+ .btn-info { border-color: var(--accent); color: var(--accent); }
81
+ .btn-warn { border-color: var(--yellow); color: var(--yellow); }
82
+ .btn-success { border-color: var(--green); color: var(--green); }
83
+ .btn-danger { border-color: var(--red); color: var(--red); background: rgba(248,81,73,0.1); }
84
+ .btn-escalate { border-color: var(--purple); color: var(--purple); background: rgba(163,113,247,0.1); }
85
+ .btn:not(:disabled):hover { opacity: 0.82; }
86
+
87
+ /* ── Main three-column layout ─────────────────────────────── */
88
+ .layout {
89
+ display: grid;
90
+ grid-template-columns: 1fr 310px 310px;
91
+ grid-template-rows: calc(100vh - var(--topbar-h));
92
+ gap: 0;
93
+ margin-top: var(--topbar-h);
94
+ overflow: hidden;
95
+ }
96
+
97
+ /* ── Generic panel ────────────────────────────────────────── */
98
+ .panel {
99
+ background: var(--bg-1);
100
+ border-right: 1px solid var(--bg-3);
101
+ overflow-y: auto;
102
+ padding: 1rem;
103
+ display: flex;
104
+ flex-direction: column;
105
+ gap: 0.75rem;
106
+ }
107
+ .panel:last-child { border-right: none; }
108
+ .panel-header {
109
+ font-weight: 700;
110
+ font-size: 0.85rem;
111
+ color: var(--text-dim);
112
+ text-transform: uppercase;
113
+ letter-spacing: 0.06em;
114
+ display: flex;
115
+ align-items: center;
116
+ gap: 0.75rem;
117
+ flex-wrap: wrap;
118
+ }
119
+ .section-title {
120
+ font-size: 0.78rem;
121
+ font-weight: 600;
122
+ color: var(--text-dim);
123
+ text-transform: uppercase;
124
+ letter-spacing: 0.05em;
125
+ }
126
+
127
+ /* ── Task metadata ────────────────────────────────────────── */
128
+ #task-label { color: var(--accent); font-size: 0.9rem; }
129
+
130
+ .difficulty-badge {
131
+ font-size: 0.72rem;
132
+ padding: 2px 8px;
133
+ border-radius: 10px;
134
+ background: var(--bg-2);
135
+ border: 1px solid var(--bg-3);
136
+ text-transform: capitalize;
137
+ }
138
+ .difficulty-badge.ultra-easy { color: var(--green); border-color: var(--green); }
139
+ .difficulty-badge.easy { color: var(--accent); border-color: var(--accent); }
140
+ .difficulty-badge.medium { color: var(--yellow); border-color: var(--yellow); }
141
+ .difficulty-badge.hard { color: var(--orange); border-color: var(--orange); }
142
+ .difficulty-badge.adversarial{ color: var(--red); border-color: var(--red); }
143
+
144
+ .steps-counter { margin-left: auto; font-size: 0.8rem; color: var(--text-dim); }
145
+
146
+ .task-desc {
147
+ font-size: 0.82rem;
148
+ color: var(--text-dim);
149
+ line-height: 1.6;
150
+ background: var(--bg-2);
151
+ border: 1px solid var(--bg-3);
152
+ border-radius: var(--radius);
153
+ padding: 0.6rem 0.8rem;
154
+ }
155
+
156
+ .adversarial-hint {
157
+ font-size: 0.8rem;
158
+ background: rgba(163,113,247,0.1);
159
+ border: 1px solid var(--purple);
160
+ border-radius: var(--radius);
161
+ padding: 0.5rem 0.75rem;
162
+ color: var(--purple);
163
+ }
164
+
165
+ /* ── Code viewer ──────────────────────────────────────────── */
166
+ .code-wrapper {
167
+ flex: 1;
168
+ overflow: auto;
169
+ border: 1px solid var(--bg-3);
170
+ border-radius: var(--radius);
171
+ background: var(--bg-0);
172
+ }
173
+ .code-block {
174
+ font-family: var(--font-mono);
175
+ font-size: 0.78rem;
176
+ line-height: 1.65;
177
+ padding: 0.75rem 1rem;
178
+ white-space: pre;
179
+ counter-reset: line-counter;
180
+ }
181
+ .code-line { display: block; }
182
+ .code-line-num {
183
+ user-select: none;
184
+ display: inline-block;
185
+ width: 2.8em;
186
+ color: var(--text-dim);
187
+ text-align: right;
188
+ margin-right: 1em;
189
+ font-size: 0.72rem;
190
+ }
191
+ /* Highlighted lines (comment target or scanner finding) */
192
+ .code-line.hl-comment { background: rgba(79,158,255,0.12); border-left: 3px solid var(--accent); }
193
+ .code-line.hl-issue { background: rgba(248,81,73,0.10); border-left: 3px solid var(--red); }
194
+ .code-line.hl-scanner { background: rgba(210,153,34,0.10); border-left: 3px solid var(--yellow); }
195
+ .code-line.hl-context { background: rgba(63,185,80,0.08); border-left: 3px solid var(--green); }
196
+
197
+ .placeholder-text { color: var(--text-dim); font-style: italic; }
198
+
199
+ /* ── Hints ────────────────────────────────────────────────── */
200
+ .hints-list {
201
+ display: flex;
202
+ flex-direction: column;
203
+ gap: 0.4rem;
204
+ }
205
+ .hint-item {
206
+ font-size: 0.8rem;
207
+ background: rgba(63,185,80,0.08);
208
+ border: 1px solid var(--green);
209
+ border-radius: var(--radius);
210
+ padding: 0.5rem 0.75rem;
211
+ color: var(--text-main);
212
+ white-space: pre-wrap;
213
+ }
214
+
215
+ /* ── Action cards ─────────────────────────────────────────── */
216
+ .action-card {
217
+ background: var(--bg-2);
218
+ border: 1px solid var(--bg-3);
219
+ border-radius: var(--radius);
220
+ padding: 0.8rem;
221
+ display: flex;
222
+ flex-direction: column;
223
+ gap: 0.5rem;
224
+ }
225
+ .action-title {
226
+ font-size: 0.8rem;
227
+ font-weight: 700;
228
+ color: var(--text-dim);
229
+ text-transform: uppercase;
230
+ letter-spacing: 0.05em;
231
+ margin-bottom: 0.25rem;
232
+ }
233
+ .form-row {
234
+ display: flex;
235
+ flex-direction: column;
236
+ gap: 3px;
237
+ }
238
+ .form-row label { font-size: 0.75rem; color: var(--text-dim); }
239
+ .form-row input,
240
+ .form-row select,
241
+ .form-row textarea {
242
+ background: var(--bg-0);
243
+ border: 1px solid var(--bg-3);
244
+ border-radius: 5px;
245
+ color: var(--text-main);
246
+ font-family: var(--font-ui);
247
+ font-size: 0.82rem;
248
+ padding: 5px 8px;
249
+ resize: vertical;
250
+ }
251
+ .form-row input:focus,
252
+ .form-row select:focus,
253
+ .form-row textarea:focus {
254
+ outline: none;
255
+ border-color: var(--accent);
256
+ }
257
+
258
+ .quick-actions {
259
+ background: var(--bg-2);
260
+ border: 1px solid var(--bg-3);
261
+ border-radius: var(--radius);
262
+ padding: 0.8rem;
263
+ display: flex;
264
+ flex-direction: column;
265
+ gap: 0.4rem;
266
+ }
267
+ .separator { height: 1px; background: var(--bg-3); margin: 0.3rem 0; }
268
+
269
+ /* ── Reward ring ──────────────────────────────────────────── */
270
+ .reward-ring-wrap {
271
+ position: relative;
272
+ width: 120px;
273
+ margin: 0 auto;
274
+ }
275
+ .reward-ring { width: 120px; height: 120px; transform: rotate(-90deg); }
276
+ .ring-bg { fill: none; stroke: var(--bg-2); stroke-width: 10; }
277
+ .ring-track {
278
+ fill: none;
279
+ stroke: var(--accent);
280
+ stroke-width: 10;
281
+ stroke-linecap: round;
282
+ stroke-dasharray: 314; /* 2Ο€ Γ— r=50 */
283
+ stroke-dashoffset: 314;
284
+ transition: stroke-dashoffset 0.5s ease, stroke 0.5s ease;
285
+ }
286
+ .ring-label {
287
+ position: absolute;
288
+ inset: 0;
289
+ display: flex;
290
+ flex-direction: column;
291
+ align-items: center;
292
+ justify-content: center;
293
+ font-weight: 700;
294
+ font-size: 1.1rem;
295
+ }
296
+ .ring-label small { font-size: 0.65rem; color: var(--text-dim); font-weight: 400; }
297
+
298
+ /* ── Component bar chart ────────────────────��─────────────── */
299
+ .component-bars { display: flex; flex-direction: column; gap: 6px; }
300
+ .bar-row { display: flex; align-items: center; gap: 6px; }
301
+ .bar-label { font-size: 0.72rem; color: var(--text-dim); width: 90px; flex-shrink: 0; }
302
+ .bar-track { flex: 1; height: 7px; background: var(--bg-2); border-radius: 4px; overflow: hidden; }
303
+ .bar-fill { height: 100%; border-radius: 4px; width: 0; transition: width 0.4s ease; }
304
+ .bar-fill.positive { background: var(--green); }
305
+ .bar-fill.negative { background: var(--red); }
306
+ .bar-fill.neutral { background: var(--yellow); }
307
+ .bar-val { font-size: 0.72rem; width: 36px; text-align: right; color: var(--text-dim); }
308
+
309
+ /* ── Issues progress ──────────────────────────────────────── */
310
+ .issues-progress { display: flex; align-items: center; gap: 8px; }
311
+ .issues-bar-wrap {
312
+ flex: 1; height: 8px;
313
+ background: var(--bg-2);
314
+ border-radius: 4px;
315
+ overflow: hidden;
316
+ }
317
+ .issues-bar-fill {
318
+ height: 100%;
319
+ background: var(--accent);
320
+ border-radius: 4px;
321
+ width: 0;
322
+ transition: width 0.4s ease;
323
+ }
324
+
325
+ /* ── History feed ─────────────────────────────────────────── */
326
+ .history-feed {
327
+ display: flex;
328
+ flex-direction: column;
329
+ gap: 0.4rem;
330
+ max-height: 320px;
331
+ overflow-y: auto;
332
+ }
333
+ .history-empty { color: var(--text-dim); font-size: 0.8rem; font-style: italic; }
334
+ .history-item {
335
+ background: var(--bg-2);
336
+ border: 1px solid var(--bg-3);
337
+ border-radius: 6px;
338
+ padding: 0.45rem 0.65rem;
339
+ font-size: 0.78rem;
340
+ border-left: 3px solid var(--bg-3);
341
+ }
342
+ .history-item.positive { border-left-color: var(--green); }
343
+ .history-item.negative { border-left-color: var(--red); }
344
+ .history-item.neutral { border-left-color: var(--yellow); }
345
+ .history-item .h-action { font-weight: 700; color: var(--accent); }
346
+ .history-item .h-reward { font-weight: 700; }
347
+ .history-item .h-reward.pos { color: var(--green); }
348
+ .history-item .h-reward.neg { color: var(--red); }
349
+ .history-item .h-explain { color: var(--text-dim); margin-top: 2px; line-height: 1.4; }
350
+
351
+ /* ── Episode-end modal ────────────────────────────────────── */
352
+ .modal-overlay {
353
+ position: fixed; inset: 0;
354
+ background: rgba(0,0,0,0.7);
355
+ display: flex; align-items: center; justify-content: center;
356
+ z-index: 200;
357
+ }
358
+ .modal {
359
+ background: var(--bg-1);
360
+ border: 1px solid var(--bg-3);
361
+ border-radius: 12px;
362
+ padding: 2rem;
363
+ max-width: 440px;
364
+ width: 90%;
365
+ text-align: center;
366
+ display: flex;
367
+ flex-direction: column;
368
+ align-items: center;
369
+ gap: 0.75rem;
370
+ }
371
+ .modal-icon { font-size: 3rem; }
372
+ .modal h2 { font-size: 1.3rem; }
373
+ .modal p { color: var(--text-dim); font-size: 0.88rem; line-height: 1.6; }
374
+ .modal-stats {
375
+ width: 100%;
376
+ background: var(--bg-2);
377
+ border-radius: var(--radius);
378
+ padding: 0.75rem 1rem;
379
+ display: grid;
380
+ grid-template-columns: 1fr 1fr;
381
+ gap: 0.4rem 1rem;
382
+ text-align: left;
383
+ font-size: 0.82rem;
384
+ }
385
+ .modal-stats .stat-label { color: var(--text-dim); }
386
+ .modal-stats .stat-value { font-weight: 700; }
387
+
388
+ /* ── Scrollbar styling ────────────────────────────────────── */
389
+ ::-webkit-scrollbar { width: 6px; height: 6px; }
390
+ ::-webkit-scrollbar-track { background: var(--bg-1); }
391
+ ::-webkit-scrollbar-thumb { background: var(--bg-3); border-radius: 3px; }
outputs/baseline_comparison.svg ADDED
outputs/reward_breakdown.svg ADDED
run.py ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ PRobe β€” unified launcher.
3
+
4
+ Starts the FastAPI server which serves:
5
+ - The interactive frontend at http://localhost:8000/ui/
6
+ - The REST API at http://localhost:8000/docs
7
+ - The WebSocket at ws://localhost:8000/ws
8
+
9
+ Usage
10
+ -----
11
+ uv run python run.py # default: host=0.0.0.0, port=8000
12
+ uv run python run.py --port 9000
13
+ uv run python run.py --host 127.0.0.1 --port 8000
14
+ """
15
+ from __future__ import annotations
16
+
17
+ import argparse
18
+ import pathlib
19
+ import sys
20
+
21
+ # ── Path bootstrap ────────────────────────────────────────────────────────────
22
+ # Add the project root to sys.path so both `agent` and `environment` packages
23
+ # are importable regardless of how or from where this script is invoked.
24
+ PROJECT_ROOT = pathlib.Path(__file__).parent.resolve()
25
+ sys.path.insert(0, str(PROJECT_ROOT))
26
+
27
+ # ── Now safe to import the app ────────────────────────────────────────────────
28
+ from environment.app import app # noqa: E402 (import after path setup)
29
+ import uvicorn # noqa: E402
30
+
31
+
32
+ def main() -> None:
33
+ parser = argparse.ArgumentParser(
34
+ description="Start the PRobe environment server + frontend",
35
+ formatter_class=argparse.ArgumentDefaultsHelpFormatter,
36
+ )
37
+ parser.add_argument("--host", default="0.0.0.0", help="Bind host")
38
+ parser.add_argument("--port", type=int, default=8000, help="Bind port")
39
+ parser.add_argument("--reload", action="store_true",
40
+ help="Enable auto-reload on code changes (dev mode)")
41
+ args = parser.parse_args()
42
+
43
+ frontend_url = f"http://{'localhost' if args.host == '0.0.0.0' else args.host}:{args.port}/ui/"
44
+ api_url = f"http://{'localhost' if args.host == '0.0.0.0' else args.host}:{args.port}/docs"
45
+
46
+ print("\n" + "=" * 58)
47
+ print(" PRobe β€” AI Code Review Training Environment")
48
+ print("=" * 58)
49
+ print(f" Frontend β†’ {frontend_url}")
50
+ print(f" API docs β†’ {api_url}")
51
+ print(f" WebSocket β†’ ws://localhost:{args.port}/ws")
52
+ print("=" * 58 + "\n")
53
+
54
+ uvicorn.run(
55
+ "environment.app:app",
56
+ host=args.host,
57
+ port=args.port,
58
+ reload=args.reload,
59
+ # Keep uvicorn's own logging minimal so our banner stays visible
60
+ log_level="warning",
61
+ )
62
+
63
+
64
+ if __name__ == "__main__":
65
+ main()