Spaces:

sreeramajay
/

visual_reasoning-env

Sleeping

App Files Files Community

sreeramajay commited on Apr 26

Commit

2cfb17f

verified ·

1 Parent(s): d4e3646

Upload folder using huggingface_hub

Browse files

Files changed (19) hide show

.gitignore +41 -0
CLAUDE.md +0 -111
README.md +41 -41
inference_audio.py +0 -451
openenv_visual_reasoning.egg-info/PKG-INFO +0 -19
openenv_visual_reasoning.egg-info/SOURCES.txt +0 -38
openenv_visual_reasoning.egg-info/dependency_links.txt +0 -1
openenv_visual_reasoning.egg-info/entry_points.txt +0 -2
openenv_visual_reasoning.egg-info/requires.txt +0 -15
openenv_visual_reasoning.egg-info/top_level.txt +0 -1
push_to_space.ipynb +129 -0
scripts/generate_rubric_data.py +0 -1132
server/app.py +157 -91
server/app_backup.py +0 -46
train.ipynb +0 -913
train.py +0 -632
train_hf.py +0 -771
uv.lock +0 -0
viewer/audio_viewer.html +0 -865

.gitignore ADDED Viewed

	@@ -0,0 +1,41 @@

+# Virtual environment
+.venv/
+venv/
+# Python
+__pycache__/
+*.py[cod]
+*.egg-info/
+dist/
+build/
+*.egg
+# Environment / secrets
+.env
+.env.local
+# IDE
+.idea/
+.vscode/
+*.swp
+*.swo
+*~
+# OS
+.DS_Store
+Thumbs.db
+# pytest
+.pytest_cache/
+htmlcov/
+.coverage
+# uv
+uv.lock
+# Jupyter
+.ipynb_checkpoints/
+CLAUDE.md
+openenv_visual_reasoning.egg-info/

CLAUDE.md DELETED Viewed

@@ -1,111 +0,0 @@
-# Visual Reasoning Environment
-## What This Is
-An OpenEnv RL environment for training LLMs (via GRPO/RLVR) to be expert visual explainers of CS algorithms. The LLM acts as a teacher drawing on a whiteboard — it creates data structures, walks through algorithms step-by-step, and narrates the reasoning. A scoring system provides dense, verifiable rewards.
-## Architecture
-```
-inference.py / inference_tldraw.py   — LLM inference loops (client-side)
-client.py                            — WebSocket EnvClient wrapper
-server/app.py                        — FastAPI OpenEnv server entry point
-server/visual_reasoning_environment.py — Core env: reset(), step(), state management
-server/scoring.py                    — 13 weighted sub-scores + 5 penalties → reward
-server/invariant_checkers.py         — Per-algorithm correctness checks (9 algorithms)
-server/narration_scorer.py           — Qwen3-Embedding-0.6B cosine similarity scorer
-server/pedagogical_scoring.py        — Teaching quality: attention coherence, pacing, scaffolding
-server/scenario_loader.py            — Loads scenarios.json + procedural generation
-server/scenario_generator.py         — Procedural scenario generation per difficulty
-server/regions.py                    — Layout engine (queue/stack/tree/graph positioning)
-server/constants.py                  — ALLOWED_OPS, ROLE_VALUES, limits
-models.py                            — Pydantic models: VisualReasoningAction, VisualReasoningObservation
-viewer/tldraw_viewer.html            — tldraw-based browser visualizer
-```
-## Key Concepts
-- **Empty canvas paradigm**: All scenarios start empty. The LLM draws the problem first (Phase 1), then solves it step-by-step (Phase 2), then completes (Phase 3).
-- **16 canvas operations**: add_region, add_node, add_pointer, add_container, add_edge, remove_edge, push_to, pop_from, move_pointer, set_value, set_role, annotate, highlight, unhighlight, add_note, remove_entity.
-- **10 entity roles**: default, current, visited, frontier, done, pivot, root, error, inactive, comparing.
-- **Region vs Container**: Regions (`add_region`) are layout areas for visual positioning. Containers (`add_container`) track membership for push/pop. `push_to`/`pop_from` ONLY work on containers, not regions. This distinction is a common source of LLM confusion.
-- **Delta-based rewards**: `reward = (new_overall_score - previous_overall_score) + flat_penalties`. Scoring is deterministic for RL training reproducibility.
-- **Concept coverage**: The LLM claims concepts from a checklist; coverage is verified via narration evidencing with prefix matching + alias expansion (`_CONCEPT_ALIASES` and `_CONCEPT_PART_ALIASES` in scoring.py).
-## Running
-### Environment setup
-```bash
-conda activate unsloth_env
-```
-### Run tests
-```bash
-conda run -n unsloth_env python -m pytest tests/ -v
-```
-### Run inference (headless)
-```bash
-# Start the server first (in another terminal or via Docker)
-LOCAL_IMAGE_NAME=http://127.0.0.1:8000 python inference.py
-```
-### Run inference with tldraw viewer
-```bash
-python inference_tldraw.py
-# Opens browser at http://0.0.0.0:8765/
-```
-### Environment variables
-- `LOCAL_IMAGE_NAME` — Docker image or `http://127.0.0.1:PORT` for local server
-- `API_BASE_URL` — LLM API endpoint (default: HuggingFace router)
-- `API_KEY` / `HF_TOKEN` — API authentication
-- `MODEL_NAME` — LLM model (default: Qwen/Qwen2.5-72B-Instruct)
-- `VISUAL_REASONING_TASKS` — Comma-separated task list (default: easy,medium,hard,expert)
-- `DEBUG=1` — Enable verbose debug logging
-- `VIS_PORT`, `VIS_HOST`, `VIS_WAIT` — tldraw viewer settings
-## Scoring System
-13 weighted sub-scores (weights vary by difficulty level):
-- **validity** (~10-12%): Correct op formats, no invalid references
-- **invariant** (~18-22%): Algorithm correctness checked against ground truth
-- **coverage** (~17-18%): Concept checklist completion via narration evidencing
-- **narration_quality** (~6-10%): Cosine similarity against reference narrations
-- **structure** (~5-7%): Constraint satisfaction + entity monotonicity
-- **progress** (~4-5%): State must change each step; granularity penalty for >4 creations per step
-- **algorithm_completion** (~5%): Cumulative algorithm progress (% nodes placed, edges drawn, roles assigned)
-- **spatial** (~6%): Region placement on the canvas grid (semantic fit, collision avoidance, reading-order flow)
-- **consistency** (~5-7%): Unexplained entity changes are penalized
-- **attention_coherence** (~6-7%): Narration entities match op targets (fuzzy matching)
-- **visual** (~2%): Layout overlap/occlusion/crossing penalties
-- **cognitive_pacing** (~7-8%): Information density vs. novelty; penalizes creation-heavy dumps
-- **scaffolding** (~7-9%): Emphasis decreases over repeated patterns
-5 penalties subtracted from weighted sum:
-- `penalty_redundant` (0.2): All ops duplicate existing state
-- `penalty_no_op` (applied as flat -0.05 in env): Zero state delta
-- `penalty_unsupported_claims` (up to 0.3): Claiming concepts not evidenced in narration
-- `penalty_too_many_ops` (up to 0.5): Exceeding MAX_OPS_PER_STEP (14)
-- `penalty_info_dump` (up to 0.2): >5 creation ops in a single step (0.05 per excess)
-## Algorithms / Scenarios
-9 algorithm templates across 4 difficulty levels:
-- **easy**: linked_list_traversal, stack_ops, binary_search
-- **medium**: bfs_graph, hash_table_chaining
-- **hard**: dijkstra_step, bst_insert
-- **expert**: fib_memo, quicksort_lomuto
-Static scenarios in `server/scenarios/scenarios.json`, plus procedurally generated ones via `scenario_generator.py`.
-## Common Pitfalls When Modifying
-1. **Scoring must be deterministic** — no randomness, no floating-point order sensitivity. The regression test (`test_easy_1_reproducible`) enforces bit-identical scores across runs.
-2. **`first_conflict_message` takes an optional `action` arg** — pass it for context-specific error messages that help the LLM self-correct.
-3. **`compute_progress_score` takes an optional `action` arg** — needed so `complete` steps get progress=1.0.
-4. **Concept evidencing uses three layers**: exact token match → prefix morphological match (`_prefix_match`) → alias expansion (`_CONCEPT_ALIASES` / `_CONCEPT_PART_ALIASES`). When adding new scenarios, ensure all checklist concepts have corresponding aliases.
-5. **The narration scorer uses Qwen3-Embedding-0.6B on CUDA** — falls back to a heuristic `_fallback_score` if the model fails to load. Check `warmup_scorer` logs to confirm which path runs.
-6. **The `.venv` Python is broken (3.7 binary, 3.10 site-packages)** — always use `conda run -n unsloth_env` for running code.
-7. **`openenv` import resolution**: Server modules use try/except for relative vs absolute imports. Tests run from the project root with `PYTHONPATH=.`.
-8. **The inference loop uses `run_in_executor`** for LLM calls to avoid blocking the async event loop (which would cause WebSocket keepalive timeouts).

README.md CHANGED Viewed

@@ -11,7 +11,6 @@ tags:
   - reinforcement-learning
   - llm
   - grpo
-base_path: /web
 ---
 # Visual Reasoning Environment
@@ -140,33 +139,32 @@ Clipped to `[-0.2, 1.0]`. Designed for GRPO / RLVR training.
 Here's the full picture -- how the agent, the environment, and the reward signal fit together:
 ```
-┌─────────────────────────────────────────────────────────────────────┐
-│                     TRAINING LOOP (GRPO / RLVR)                     │
-│                                                                     │
-│  ┌───────────┐    prompt     ┌──────────────┐    JSON action        │
-│  │           │ ─────────────>│              │ ──────────────────┐    │
-│  │  Scenario │               │   LLM Agent  │                  │    │
-│  │  Generator│               │  (Teacher)   │                  │    │
-│  │           │    ┌─────────>│              │<──────────┐      │    │
-│  └───────────┘    │          └──────────────┘           │      │    │
-│                   │                                     │      │    │
-│             observation                              reward    │    │
-│           + score breakdown                         signal    │    │
-│                   │                                     │      │    │
-│          ┌────────┴────────┐      score          ┌─────┴──┐   │    │
-│          │                 │ <─────────────────── │        │   │    │
-│          │   Environment   │                     │ Scoring │   │    │
-│          │  (Empty Canvas) │ ──────────────────> │ Engine  │   │    │
-│          │                 │   canvas state       │(13 dim) │   │    │
-│          └─────────────────┘                     └────────┘   │    │
-│                   ^                                           │    │
-│                   │              step(action)                 │    │
-│                   └───────────────────────────────────────────┘    │
-│                                                                     │
-│  Per-step reward = Δ(overall_score) + penalties + concept_bonuses   │
-│  Episode: empty canvas ──> Phase 1 (draw) ──> Phase 2 (solve)      │
-│           ──> Phase 3 (summarize) ──> done                          │
-└─────────────────────────────────────────────────────────────────────┘
 ```
 Every episode starts with a blank canvas and a goal like *"Explain how Dijkstra's algorithm finds shortest paths in this graph."* The agent draws, narrates, and advances step by step. The scoring engine evaluates each step across all 13 dimensions. The reward signal flows back into the RL training loop, gradually shaping the agent into a better teacher.
@@ -265,19 +263,21 @@ The overall score jumped from **0.368 to 0.536** -- a 45.7% relative improvement
 ```
 SFT+GRPO Score by Difficulty (Qwen2.5-3B, single A100)
-  0.7 |
-      |                                          ┌───────┐
-  0.6 |                               ┌──────┐   │ 0.635 │  +120% from baseline
-      |                               │ 0.566│   └───────┘
-  0.5 |  ┌──────┐                     └──────┘
-      |  │ 0.481│  ┌──────┐
-  0.4 |  └──────┘  │ 0.461│
-      |            └──────┘
-  0.3 |   +33.6%    +28.4%    +21.5%    +120.5%
-      |
-  0.2 |  ░░0.360░  ░░0.359░  ░░0.466░  ░░0.288░  Baseline
-      +────────────────────────────────────────────────────
-        Easy      Medium      Hard      Expert
 ```
 A few things stand out from these results:

   - reinforcement-learning
   - llm
   - grpo
 ---
 # Visual Reasoning Environment
 Here's the full picture -- how the agent, the environment, and the reward signal fit together:
 ```
+┌─────────────────────────────────────────────────────────────────┐
+│                   TRAINING LOOP (GRPO / RLVR)                   │
+│                                                                 │
+│  ┌───────────┐  prompt   ┌──────────────┐  JSON action          │
+│  │           │ ────────> │              │ ──────────────┐       │
+│  │  Scenario │           │   LLM Agent  │               │       │
+│  │  Generator│           │  (Teacher)   │               │       │
+│  │           │  ┌──────> │              │ <────────┐    │       │
+│  └───────────┘  │        └──────────────┘          │    │       │
+│                 │                                  │    │       │
+│           observation                            reward │       │
+│         + score breakdown                        signal │       │
+│                 │                                  │    │       │
+│        ┌────────┴───────┐    score          ┌──────┴─┐  │       │
+│        │                │ <──────────────── │        │  │       │
+│        │  Environment   │                   │Scoring │  │       │
+│        │ (Empty Canvas) │ ───────────────>  │ Engine │  │       │
+│        │                │  canvas state     │(13 dim)│  │       │
+│        └────────────────┘                   └────────┘  │       │
+│                 ^                                       │       │
+│                 │          step(action)                 │       │
+│                 └───────────────────────────────────────┘       │
+│                                                                 │
+│  reward = Δ(overall_score) + penalties + concept_bonuses        │
+│  Episode: empty canvas ──> Phase 1 ──> Phase 2 ──> Phase 3      │
+└─────────────────────────────────────────────────────────────────┘
 ```
 Every episode starts with a blank canvas and a goal like *"Explain how Dijkstra's algorithm finds shortest paths in this graph."* The agent draws, narrates, and advances step by step. The scoring engine evaluates each step across all 13 dimensions. The reward signal flows back into the RL training loop, gradually shaping the agent into a better teacher.
 ```
 SFT+GRPO Score by Difficulty (Qwen2.5-3B, single A100)
+      |
+ 0.7  |
+      |                                        ┌────────┐
+ 0.6  |                          ┌────────┐    │  0.635 │
+      |                          │  0.566 │    └────────┘
+ 0.5  |  ┌────────┐              └────────┘
+      |  │  0.481 │  ┌────────┐
+ 0.4  |  └────────┘  │  0.461 │
+      |              └────────┘
+ 0.3  |   +33.6%      +28.4%     +21.5%       +120.5%
+      |
+ 0.2  |  ░ 0.360 ░   ░ 0.359 ░  ░ 0.466 ░    ░ 0.288 ░
+      |                                                     Baseline
+      +─────────────────────────────────────────────────
+          Easy       Medium       Hard        Expert
 ```
 A few things stand out from these results:

inference_audio.py DELETED Viewed

@@ -1,451 +0,0 @@
-# Copyright (c) Meta Platforms, Inc. and affiliates.
-# All rights reserved.
-#
-# This source code is licensed under the BSD-style license found in the
-# LICENSE file in the root directory of this source tree.
-from __future__ import annotations
-import asyncio
-import base64
-import contextlib
-import json
-import os
-import sys
-import time
-from pathlib import Path
-from typing import Any, Dict, List, Optional, Set, Tuple
-import uvicorn
-from fastapi import FastAPI, WebSocket, WebSocketDisconnect
-from fastapi.responses import FileResponse
-from openai import OpenAI
-from client import VisualReasoningEnv
-from models import VisualReasoningAction
-from inference import (
-    API_BASE_URL,
-    API_KEY,
-    BENCHMARK,
-    HF_SPACE_URL,
-    LOCAL_IMAGE_NAME,
-    MAX_STEPS,
-    MODEL_NAME,
-    SUCCESS_SCORE_THRESHOLD,
-    TASK_NAMES,
-    action_to_string,
-    debug,
-    get_model_action_async,
-    log_end,
-    log_start,
-    log_step,
-)
-VIS_PORT = int(os.getenv("VIS_PORT", "8765"))
-VIS_HOST = os.getenv("VIS_HOST", "0.0.0.0")
-VIS_WAIT = float(os.getenv("VIS_WAIT", "30"))
-VIEWER_DIR = Path(__file__).parent / "viewer"
-HTML_PATH = VIEWER_DIR / "audio_viewer.html"
-AUDIO_DIR = Path(__file__).parent / "others"
-BACKGROUND_MUSIC_PATH = AUDIO_DIR / "tutorial_background.mp3"
-HISTORY_CAP = 500
-TTS_LEAD_TIME = 2.0
-TTS_MIN_WAIT = 1.5
-TTS_MODEL = "hexgrad/Kokoro-82M"
-TTS_PROVIDER = "fal-ai"
-def vlog(msg: str) -> None:
-    print(f"[AUDIO] {msg}", file=sys.stderr, flush=True)
-# ---------------------------------------------------------------------------
-# TTS via huggingface_hub InferenceClient
-# ---------------------------------------------------------------------------
-def _make_tts_client():
-    from huggingface_hub import InferenceClient
-    return InferenceClient(provider=TTS_PROVIDER, api_key=os.environ.get("HF_TOKEN", ""))
-_tts_client = None
-def _get_tts_client():
-    global _tts_client
-    if _tts_client is None:
-        _tts_client = _make_tts_client()
-    return _tts_client
-def _estimate_duration(audio_bytes: bytes) -> float:
-    if len(audio_bytes) < 44:
-        return 0.0
-    # Try WAV header
-    if audio_bytes[:4] == b"RIFF" and audio_bytes[8:12] == b"WAVE":
-        import struct
-        try:
-            byte_rate = struct.unpack_from("<I", audio_bytes, 28)[0]
-            data_offset = audio_bytes.find(b"data")
-            if data_offset >= 0 and byte_rate > 0:
-                data_size = struct.unpack_from("<I", audio_bytes, data_offset + 4)[0]
-                return data_size / byte_rate
-        except Exception:
-            pass
-    # Rough estimate for compressed audio (~16kB/s for mp3 at 128kbps)
-    return len(audio_bytes) / 16000.0
-def _synthesize_sync(text: str) -> Tuple[Optional[str], float]:
-    if not text or not text.strip():
-        return None, 0.0
-    try:
-        client = _get_tts_client()
-        audio_bytes = client.text_to_speech(text, model=TTS_MODEL)
-        if not audio_bytes:
-            vlog("TTS returned empty audio")
-            return None, 0.0
-        duration = _estimate_duration(audio_bytes)
-        vlog(f"TTS ok: {len(audio_bytes)} bytes, ~{duration:.1f}s")
-        return base64.b64encode(audio_bytes).decode("ascii"), duration
-    except Exception as exc:
-        vlog(f"TTS error: {exc}")
-        return None, 0.0
-# ---------------------------------------------------------------------------
-# Broadcaster
-# ---------------------------------------------------------------------------
-class Broadcaster:
-    def __init__(self) -> None:
-        self._clients: Set[WebSocket] = set()
-        self._history: List[Dict[str, Any]] = []
-        self._first_client = asyncio.Event()
-        self._lock = asyncio.Lock()
-    async def register(self, ws: WebSocket) -> None:
-        async with self._lock:
-            self._clients.add(ws)
-            replay = list(self._history)
-            self._first_client.set()
-        vlog(f"connected clients={len(self._clients)}")
-        for msg in replay:
-            try:
-                await ws.send_text(json.dumps(msg))
-            except Exception:
-                return
-    async def unregister(self, ws: WebSocket) -> None:
-        async with self._lock:
-            self._clients.discard(ws)
-        vlog(f"client disconnected clients={len(self._clients)}")
-    async def send(self, msg: Dict[str, Any]) -> None:
-        history_msg = {k: v for k, v in msg.items() if k != "audio"}
-        async with self._lock:
-            self._history.append(history_msg)
-            if len(self._history) > HISTORY_CAP:
-                self._history = self._history[-HISTORY_CAP:]
-            targets = list(self._clients)
-        if not targets:
-            return
-        payload = json.dumps(msg)
-        dead: List[WebSocket] = []
-        for ws in targets:
-            try:
-                await ws.send_text(payload)
-            except Exception:
-                dead.append(ws)
-        if dead:
-            async with self._lock:
-                for ws in dead:
-                    self._clients.discard(ws)
-    async def wait_for_client(self, timeout: float) -> bool:
-        if timeout <= 0:
-            return self._first_client.is_set()
-        try:
-            await asyncio.wait_for(self._first_client.wait(), timeout=timeout)
-            return True
-        except asyncio.TimeoutError:
-            return False
-# ---------------------------------------------------------------------------
-# FastAPI app
-# ---------------------------------------------------------------------------
-def build_viewer_app(broadcaster: Broadcaster) -> FastAPI:
-    app = FastAPI()
-    @app.get("/")
-    async def index():
-        return FileResponse(str(HTML_PATH), media_type="text/html")
-    @app.get("/audio/background.mp3")
-    async def background_music():
-        if BACKGROUND_MUSIC_PATH.exists():
-            return FileResponse(str(BACKGROUND_MUSIC_PATH), media_type="audio/mpeg")
-        return {"error": "background music not found"}
-    @app.get("/health")
-    async def health():
-        return {"ok": True}
-    @app.websocket("/ws")
-    async def ws_endpoint(ws: WebSocket):
-        await ws.accept()
-        await broadcaster.register(ws)
-        try:
-            while True:
-                await ws.receive_text()
-        except WebSocketDisconnect:
-            pass
-        except Exception as exc:
-            debug(f"ws error: {exc}")
-        finally:
-            await broadcaster.unregister(ws)
-    return app
-async def start_viewer(broadcaster: Broadcaster):
-    if not HTML_PATH.exists():
-        vlog(f"ERROR: viewer HTML missing at {HTML_PATH}")
-        sys.exit(1)
-    app = build_viewer_app(broadcaster)
-    config = uvicorn.Config(
-        app,
-        host=VIS_HOST,
-        port=VIS_PORT,
-        log_config=None,
-        access_log=False,
-        log_level="warning",
-    )
-    server = uvicorn.Server(config)
-    task = asyncio.create_task(server.serve())
-    for _ in range(50):
-        if server.started:
-            break
-        if task.done():
-            exc = task.exception()
-            vlog(f"viewer server failed to start: {exc}")
-            sys.exit(1)
-        await asyncio.sleep(0.05)
-    return server, task
-# ---------------------------------------------------------------------------
-# Observation snapshot
-# ---------------------------------------------------------------------------
-def _obs_snapshot(obs: Any) -> Dict[str, Any]:
-    return {
-        "entities": {k: dict(v) for k, v in (obs.entities or {}).items()},
-        "relations": [dict(r) for r in (obs.relations or [])],
-        "layout": {k: dict(v) for k, v in (obs.layout or {}).items()},
-        "annotations": [dict(a) for a in (obs.annotations or [])],
-        "notes": [dict(n) for n in (obs.notes or [])],
-        "score_breakdown": {
-            k: (float(v) if isinstance(v, (int, float)) else v)
-            for k, v in (obs.score_breakdown or {}).items()
-        },
-        "coverage": list(obs.concept_coverage),
-        "narration_history": list(obs.narration_history),
-        "remaining_step_budget": obs.remaining_step_budget,
-    }
-# ---------------------------------------------------------------------------
-# Episode runner with audio
-# ---------------------------------------------------------------------------
-async def run_episode_streaming(
-    env: Any, client: OpenAI, task_name: str, broadcaster: Broadcaster
-) -> None:
-    history: List[str] = []
-    rewards: List[float] = []
-    steps_taken = 0
-    score = 0.0
-    success = False
-    last_reward = 0.0
-    last_action: Optional[Dict[str, Any]] = None
-    obs = None
-    loop = asyncio.get_running_loop()
-    log_start(task=task_name, env=BENCHMARK, model=MODEL_NAME)
-    try:
-        result = await env.reset(task_name=task_name)
-        obs = result.observation
-        debug(
-            f"RESET: scenario={obs.scenario_id} goal={obs.goal} "
-            f"budget={obs.remaining_step_budget}"
-        )
-        goal_audio, goal_dur = await loop.run_in_executor(
-            None, _synthesize_sync, obs.goal
-        )
-        snap = _obs_snapshot(obs)
-        await broadcaster.send(
-            {
-                "type": "reset",
-                "task_name": obs.task_name,
-                "scenario_id": obs.scenario_id,
-                "goal": obs.goal,
-                "checklist": list(obs.concept_checklist),
-                "input_data": dict(obs.input_data),
-                "constraints": list(obs.constraints),
-                "max_steps": obs.max_steps,
-                "audio": goal_audio,
-                "audio_duration": goal_dur,
-                **snap,
-            }
-        )
-        if goal_dur > 0:
-            await asyncio.sleep(max(TTS_MIN_WAIT, goal_dur - TTS_LEAD_TIME))
-        else:
-            await asyncio.sleep(TTS_MIN_WAIT)
-        for step in range(1, MAX_STEPS + 1):
-            if result.done:
-                break
-            step_start = time.monotonic()
-            obs = result.observation
-            action_dict = await get_model_action_async(
-                client, obs, last_action, last_reward, history
-            )
-            action = VisualReasoningAction(**action_dict)
-            narration = action_dict.get("narration", "")
-            env_future = asyncio.ensure_future(env.step(action))
-            tts_future = loop.run_in_executor(None, _synthesize_sync, narration)
-            result = await env_future
-            audio_b64, audio_dur = await tts_future
-            obs = result.observation
-            reward = result.reward or 0.0
-            done = result.done
-            error = obs.action_error
-            rewards.append(reward)
-            steps_taken = step
-            last_reward = reward
-            last_action = action_dict
-            log_step(
-                step=step,
-                action=action_to_string(action_dict),
-                reward=reward,
-                done=done,
-                error=error,
-            )
-            history.append(f"Step {step}: action={action_to_string(action_dict)}")
-            snap = _obs_snapshot(obs)
-            await broadcaster.send(
-                {
-                    "type": "step",
-                    "task_name": obs.task_name,
-                    "scenario_id": obs.scenario_id,
-                    "step": step,
-                    "step_type": action_dict.get("step_type"),
-                    "intent": action_dict.get("intent", ""),
-                    "narration": narration,
-                    "ops": action_dict.get("ops", []),
-                    "covered_concepts": action_dict.get("covered_concepts", []),
-                    "reward": float(reward),
-                    "score": float(obs.score_breakdown.get("overall_score", 0.0)),
-                    "done": bool(done),
-                    "error": error,
-                    "audio": audio_b64,
-                    "audio_duration": audio_dur,
-                    **snap,
-                }
-            )
-            if done:
-                if audio_dur > 0:
-                    await asyncio.sleep(max(0, audio_dur + 0.5))
-                break
-            elapsed = time.monotonic() - step_start
-            target = max(TTS_MIN_WAIT, audio_dur - TTS_LEAD_TIME) if audio_dur > 0 else TTS_MIN_WAIT
-            remaining = max(0, target - elapsed)
-            if remaining > 0:
-                await asyncio.sleep(remaining)
-        if obs is not None and steps_taken > 0:
-            score = float(obs.score_breakdown.get("overall_score", 0.0))
-            score = min(max(score, 0.0), 1.0)
-        success = score >= SUCCESS_SCORE_THRESHOLD
-    finally:
-        log_end(success=success, steps=steps_taken, score=score, rewards=rewards)
-        await broadcaster.send(
-            {
-                "type": "end",
-                "task_name": task_name,
-                "success": bool(success),
-                "steps": steps_taken,
-                "score": float(score),
-                "rewards": [float(r) for r in rewards],
-            }
-        )
-# ---------------------------------------------------------------------------
-# Main
-# ---------------------------------------------------------------------------
-async def main() -> None:
-    broadcaster = Broadcaster()
-    server, server_task = await start_viewer(broadcaster)
-    vlog(f"open http://{VIS_HOST}:{VIS_PORT}/ in your browser")
-    if not BACKGROUND_MUSIC_PATH.exists():
-        vlog(f"WARNING: background music not found at {BACKGROUND_MUSIC_PATH}")
-    if VIS_WAIT > 0:
-        vlog(f"waiting up to {VIS_WAIT:.0f}s for a browser connection...")
-        connected = await broadcaster.wait_for_client(VIS_WAIT)
-        if not connected:
-            vlog("proceeding without viewer (no browser connected in time)")
-    else:
-        vlog("VIS_WAIT=0, starting immediately")
-    client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
-    if LOCAL_IMAGE_NAME:
-        if LOCAL_IMAGE_NAME.startswith("http://127.0.0.1:"):
-            env = VisualReasoningEnv(base_url=LOCAL_IMAGE_NAME, message_timeout_s=120)
-        else:
-            env = await VisualReasoningEnv.from_docker_image(LOCAL_IMAGE_NAME)
-    else:
-        env = await VisualReasoningEnv.from_env(HF_SPACE_URL, use_docker=False)
-    try:
-        for task_name in TASK_NAMES:
-            await run_episode_streaming(env, client, task_name, broadcaster)
-        await broadcaster.send({"type": "shutdown"})
-        await asyncio.sleep(0.5)
-    finally:
-        try:
-            await env.close()
-        except Exception as exc:
-            print(f"[DEBUG] env.close() error: {exc}", file=sys.stderr, flush=True)
-        server.should_exit = True
-        with contextlib.suppress(Exception):
-            await server_task
-if __name__ == "__main__":
-    asyncio.run(main())

openenv_visual_reasoning.egg-info/PKG-INFO DELETED Viewed

@@ -1,19 +0,0 @@
-Metadata-Version: 2.4
-Name: openenv-visual_reasoning
-Version: 0.1.0
-Summary: Visual Reasoning environment for OpenEnv — step-based RL for grounded visual + textual CS explanations
-Requires-Python: >=3.10
-Requires-Dist: openenv-core[core]>=0.2.2
-Requires-Dist: numpy<2.0
-Requires-Dist: python-dotenv>=1.0.0
-Requires-Dist: networkx>=3.1
-Requires-Dist: shapely>=2.0
-Requires-Dist: sentence-transformers>=2.2
-Requires-Dist: rapidfuzz>=3.0
-Requires-Dist: textstat>=0.7
-Requires-Dist: sortedcontainers>=2.4
-Requires-Dist: aiohttp>=3.9
-Requires-Dist: openai>=1.0
-Provides-Extra: dev
-Requires-Dist: pytest>=8.0.0; extra == "dev"
-Requires-Dist: pytest-cov>=4.0.0; extra == "dev"

openenv_visual_reasoning.egg-info/SOURCES.txt DELETED Viewed

@@ -1,38 +0,0 @@
-README.md
-__init__.py
-client.py
-inference.py
-inference_audio.py
-inference_excalidraw.py
-inference_tldraw.py
-models.py
-pyproject.toml
-./__init__.py
-./client.py
-./inference.py
-./inference_audio.py
-./inference_excalidraw.py
-./inference_tldraw.py
-./models.py
-openenv_visual_reasoning.egg-info/PKG-INFO
-openenv_visual_reasoning.egg-info/SOURCES.txt
-openenv_visual_reasoning.egg-info/dependency_links.txt
-openenv_visual_reasoning.egg-info/entry_points.txt
-openenv_visual_reasoning.egg-info/requires.txt
-openenv_visual_reasoning.egg-info/top_level.txt
-server/__init__.py
-server/app.py
-server/app_backup.py
-server/constants.py
-server/invariant_checkers.py
-server/narration_scorer.py
-server/pedagogical_scoring.py
-server/regions.py
-server/scenario_generator.py
-server/scenario_loader.py
-server/scoring.py
-server/visual_reasoning_environment.py
-tests/test_environment.py
-tests/test_regions.py
-tests/test_scenario_loader.py
-tests/test_scoring.py

openenv_visual_reasoning.egg-info/dependency_links.txt DELETED Viewed

	@@ -1 +0,0 @@
1	-

openenv_visual_reasoning.egg-info/entry_points.txt DELETED Viewed

	@@ -1,2 +0,0 @@
1	- [console_scripts]
2	- server = visual_reasoning.server.app:main

openenv_visual_reasoning.egg-info/requires.txt DELETED Viewed

@@ -1,15 +0,0 @@
-openenv-core[core]>=0.2.2
-numpy<2.0
-python-dotenv>=1.0.0
-networkx>=3.1
-shapely>=2.0
-sentence-transformers>=2.2
-rapidfuzz>=3.0
-textstat>=0.7
-sortedcontainers>=2.4
-aiohttp>=3.9
-openai>=1.0
-[dev]
-pytest>=8.0.0
-pytest-cov>=4.0.0

openenv_visual_reasoning.egg-info/top_level.txt DELETED Viewed

	@@ -1 +0,0 @@
1	- visual_reasoning

push_to_space.ipynb ADDED Viewed

	@@ -0,0 +1,129 @@

+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "34c098b1",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/opt/conda/envs/unsloth_env/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
+      "  from .autonotebook import tqdm as notebook_tqdm\n"
+     ]
+    }
+   ],
+   "source": [
+    "from dotenv import load_dotenv\n",
+    "from huggingface_hub import HfApi\n",
+    "import os"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "eff398ee",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "True"
+      ]
+     },
+     "execution_count": 2,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "load_dotenv(\"../.env\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "fb24296b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# api = HfApi(token=os.getenv(\"HF_TOKEN\"))\n",
+    "# api.upload_folder(\n",
+    "#     repo_id=\"sreeramajay/visual_reasoning-env\",\n",
+    "#     folder_path=\".\",\n",
+    "#     repo_type=\"space\",\n",
+    "#     delete_patterns=[\"*\"],  # deletes all remote files not present locally\n",
+    "# )"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "02ea1bf0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "api = HfApi(token=os.getenv(\"HF_TOKEN\"))\n",
+    "api.upload_folder(\n",
+    "    repo_id=\"sreeramajay/visual_reasoning-env\",\n",
+    "    folder_path=\".\",\n",
+    "    repo_type=\"space\",\n",
+    "    delete_patterns=[\"*\"],\n",
+    "    ignore_patterns=[\n",
+    "        \".venv/**\",\n",
+    "        \"venv/**\",\n",
+    "        \"**/__pycache__/**\",\n",
+    "        \"**/*.py[cod]\",\n",
+    "        \"**/*.egg-info/**\",\n",
+    "        \"dist/**\",\n",
+    "        \"build/**\",\n",
+    "        \".env\",\n",
+    "        \".env.local\",\n",
+    "        \".idea/**\",\n",
+    "        \".vscode/**\",\n",
+    "        \".pytest_cache/**\",\n",
+    "        \"htmlcov/**\",\n",
+    "        \".coverage\",\n",
+    "        \"uv.lock\",\n",
+    "        \"**/.ipynb_checkpoints/**\",\n",
+    "        \"CLAUDE.md\",\n",
+    "        \"openenv_visual_reasoning.egg-info/**\",\n",
+    "        \".DS_Store\",\n",
+    "        \"Thumbs.db\",\n",
+    "    ],\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3dc787ac",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "unsloth_env",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.11"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

scripts/generate_rubric_data.py DELETED Viewed

@@ -1,1132 +0,0 @@
-# Copyright (c) Meta Platforms, Inc. and affiliates.
-# All rights reserved.
-#
-# This source code is licensed under the BSD-style license found in the
-# LICENSE file in the root directory of this source tree.
-import argparse
-import json
-import os
-import sys
-from pathlib import Path
-from typing import Any, Dict, List, Optional
-sys.path.insert(0, str(Path(__file__).parent.parent))
-from server.visual_reasoning_environment import VisualReasoningEnvironment
-from models import VisualReasoningAction
-LABELING_PROMPT = """Rate this narration on four dimensions (0.0 to 1.0, steps of 0.25):
-CONTEXT:
-- Algorithm: {template}
-- Ops this step: {ops_summary}
-- Previous narration: {prev_narration}
-- Current narration: {narration}
-- Concepts claimed: {covered_concepts}
-- Progress: {step_progress}
-DIMENSIONS:
-1. explanatory_depth: 0.0=mechanical, 0.5=basic context, 1.0=explains why+broader concept
-2. grounding_accuracy: 0.0=contradicts ops, 0.5=partial, 1.0=accurate+describes effects
-3. clarity: 0.0=incomprehensible, 0.5=adequate, 1.0=clear natural voiceover
-4. flow: 0.0=disconnected, 0.5=adequate connection, 1.0=natural continuation
-Output ONLY JSON: {{"explanatory_depth": X, "grounding_accuracy": X, "clarity": X, "flow": X}}"""
-# ---------------------------------------------------------------------------
-# easy_1: linked_list_traversal, incremental, values=[10,20,30]
-# ---------------------------------------------------------------------------
-def _easy_1_excellent() -> List[Dict[str, Any]]:
-    return [
-        {
-            "step_type": "advance",
-            "narration": "We create a linked list region and place the head pointer at position 0 — this is the only entry point for traversal.",
-            "ops": [
-                {"op": "add_region", "target_ids": ["list"], "params": {"style": "queue", "title": "Linked List"}},
-                {"op": "add_pointer", "target_ids": ["head"], "params": {"region": "list", "index": 0}},
-            ],
-            "covered_concepts": ["head_pointer"],
-            "intent": "setup",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Adding node_1 with value 10 — the head pointer starts here, making it the first node we visit in the traversal.",
-            "ops": [
-                {"op": "add_node", "target_ids": ["node_1"], "params": {"value": 10, "region": "list"}},
-                {"op": "set_role", "target_ids": ["node_1"], "params": {"role": "current"}},
-                {"op": "annotate", "target_ids": ["node_1"], "params": {"text": "val=10"}},
-            ],
-            "covered_concepts": ["node_value"],
-            "intent": "first-node",
-        },
-        {
-            "step_type": "advance",
-            "narration": "We add node_2 with value 20 and link it from node_1 via a next pointer — following these next links is how we traverse the list.",
-            "ops": [
-                {"op": "add_node", "target_ids": ["node_2"], "params": {"value": 20, "region": "list"}},
-                {"op": "add_edge", "target_ids": ["node_1", "node_2"], "params": {"label": "next"}},
-                {"op": "set_role", "target_ids": ["node_1"], "params": {"role": "visited"}},
-                {"op": "set_role", "target_ids": ["node_2"], "params": {"role": "current"}},
-                {"op": "move_pointer", "target_ids": ["head"], "params": {"index": 1}},
-            ],
-            "covered_concepts": ["next_link"],
-            "intent": "second-node",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Finally node_3 with value 30 has no next link — this null terminator marks it as the tail, ending our traversal.",
-            "ops": [
-                {"op": "add_node", "target_ids": ["node_3"], "params": {"value": 30, "region": "list"}},
-                {"op": "add_edge", "target_ids": ["node_2", "node_3"], "params": {"label": "next"}},
-                {"op": "set_role", "target_ids": ["node_2"], "params": {"role": "visited"}},
-                {"op": "set_role", "target_ids": ["node_3"], "params": {"role": "done"}},
-                {"op": "annotate", "target_ids": ["node_3"], "params": {"text": "tail (null next)"}},
-            ],
-            "covered_concepts": ["tail_marker"],
-            "intent": "tail",
-        },
-        {
-            "step_type": "complete",
-            "narration": "The traversal is complete — we visited every node from head to tail by following next pointers.",
-            "ops": [],
-            "covered_concepts": [],
-            "intent": "done",
-        },
-    ]
-def _easy_1_good() -> List[Dict[str, Any]]:
-    return [
-        {
-            "step_type": "advance",
-            "narration": "Setting up the linked list with a head pointer.",
-            "ops": [
-                {"op": "add_region", "target_ids": ["list"], "params": {"style": "queue", "title": "List"}},
-                {"op": "add_pointer", "target_ids": ["head"], "params": {"region": "list"}},
-                {"op": "add_node", "target_ids": ["n1"], "params": {"value": 10, "region": "list"}},
-            ],
-            "covered_concepts": ["head_pointer", "node_value"],
-            "intent": "setup",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Adding the next two nodes and connecting them.",
-            "ops": [
-                {"op": "add_node", "target_ids": ["n2"], "params": {"value": 20, "region": "list"}},
-                {"op": "add_node", "target_ids": ["n3"], "params": {"value": 30, "region": "list"}},
-                {"op": "add_edge", "target_ids": ["n1", "n2"], "params": {}},
-                {"op": "add_edge", "target_ids": ["n2", "n3"], "params": {}},
-            ],
-            "covered_concepts": ["next_link"],
-            "intent": "build",
-        },
-        {
-            "step_type": "complete",
-            "narration": "The tail node has no next pointer, traversal ends here.",
-            "ops": [
-                {"op": "annotate", "target_ids": ["n3"], "params": {"text": "tail"}},
-            ],
-            "covered_concepts": ["tail_marker"],
-            "intent": "done",
-        },
-    ]
-def _easy_1_mediocre() -> List[Dict[str, Any]]:
-    return [
-        {
-            "step_type": "advance",
-            "narration": "Adding nodes to the list.",
-            "ops": [
-                {"op": "add_node", "target_ids": ["a"], "params": {"value": 10}},
-                {"op": "add_node", "target_ids": ["b"], "params": {"value": 20}},
-                {"op": "add_node", "target_ids": ["c"], "params": {"value": 30}},
-                {"op": "add_edge", "target_ids": ["a", "b"], "params": {}},
-                {"op": "add_edge", "target_ids": ["b", "c"], "params": {}},
-            ],
-            "covered_concepts": ["head_pointer", "node_value", "next_link", "tail_marker"],
-            "intent": "",
-        },
-        {
-            "step_type": "complete",
-            "narration": "Done.",
-            "ops": [],
-            "covered_concepts": [],
-            "intent": "",
-        },
-    ]
-def _easy_1_bad() -> List[Dict[str, Any]]:
-    return [
-        {
-            "step_type": "advance",
-            "narration": "Starting.",
-            "ops": [
-                {"op": "add_node", "target_ids": ["x"], "params": {}},
-            ],
-            "covered_concepts": ["head_pointer", "node_value", "next_link", "tail_marker"],
-            "intent": "",
-        },
-        {
-            "step_type": "complete",
-            "narration": "Finished.",
-            "ops": [],
-            "covered_concepts": [],
-            "intent": "",
-        },
-    ]
-# ---------------------------------------------------------------------------
-# easy_2: stack_ops, incremental, operations=["push A","push B","pop","push C"]
-# ---------------------------------------------------------------------------
-def _easy_2_excellent() -> List[Dict[str, Any]]:
-    return [
-        {
-            "step_type": "advance",
-            "narration": "We set up a stack region with a container — stacks follow Last In First Out order, so we'll use ordered=false.",
-            "ops": [
-                {"op": "add_region", "target_ids": ["stk"], "params": {"style": "stack", "title": "Stack"}},
-                {"op": "add_container", "target_ids": ["stack"], "params": {"region": "stk", "ordered": False}},
-                {"op": "add_pointer", "target_ids": ["top"], "params": {"region": "stk", "index": -1}},
-            ],
-            "covered_concepts": ["top_pointer"],
-            "intent": "setup",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Push A onto the stack — A becomes the new top. Then push B on top of A, so B is now the top element.",
-            "ops": [
-                {"op": "add_node", "target_ids": ["A"], "params": {"value": "A", "region": "stk"}},
-                {"op": "push_to", "target_ids": ["stack", "A"], "params": {}},
-                {"op": "add_node", "target_ids": ["B"], "params": {"value": "B", "region": "stk"}},
-                {"op": "push_to", "target_ids": ["stack", "B"], "params": {}},
-                {"op": "move_pointer", "target_ids": ["top"], "params": {"index": 1}},
-            ],
-            "covered_concepts": ["push"],
-            "intent": "push-A-B",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Pop removes B — the most recently pushed element — revealing A underneath. This is the LIFO property in action.",
-            "ops": [
-                {"op": "pop_from", "target_ids": ["stack"], "params": {}},
-                {"op": "set_role", "target_ids": ["B"], "params": {"role": "inactive"}},
-                {"op": "move_pointer", "target_ids": ["top"], "params": {"index": 0}},
-            ],
-            "covered_concepts": ["pop", "lifo_order"],
-            "intent": "pop-B",
-        },
-        {
-            "step_type": "complete",
-            "narration": "Push C on top — the stack now holds [A, C] with C as the new top, confirming LIFO order throughout.",
-            "ops": [
-                {"op": "add_node", "target_ids": ["C"], "params": {"value": "C", "region": "stk"}},
-                {"op": "push_to", "target_ids": ["stack", "C"], "params": {}},
-                {"op": "move_pointer", "target_ids": ["top"], "params": {"index": 1}},
-            ],
-            "covered_concepts": [],
-            "intent": "push-C-complete",
-        },
-    ]
-def _easy_2_good() -> List[Dict[str, Any]]:
-    return [
-        {
-            "step_type": "advance",
-            "narration": "Creating a stack region with a container and top pointer.",
-            "ops": [
-                {"op": "add_region", "target_ids": ["stk"], "params": {"style": "stack", "title": "Stack"}},
-                {"op": "add_container", "target_ids": ["stack"], "params": {"region": "stk", "ordered": False}},
-                {"op": "add_pointer", "target_ids": ["top"], "params": {"region": "stk", "index": -1}},
-            ],
-            "covered_concepts": ["top_pointer"],
-            "intent": "setup",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Push A and B onto the stack, B is now on top.",
-            "ops": [
-                {"op": "add_node", "target_ids": ["A"], "params": {"value": "A", "region": "stk"}},
-                {"op": "push_to", "target_ids": ["stack", "A"], "params": {}},
-                {"op": "add_node", "target_ids": ["B"], "params": {"value": "B", "region": "stk"}},
-                {"op": "push_to", "target_ids": ["stack", "B"], "params": {}},
-            ],
-            "covered_concepts": ["push"],
-            "intent": "push-AB",
-        },
-        {
-            "step_type": "complete",
-            "narration": "Pop B, then push C. Stack ends as [A, C] demonstrating LIFO.",
-            "ops": [
-                {"op": "pop_from", "target_ids": ["stack"], "params": {}},
-                {"op": "add_node", "target_ids": ["C"], "params": {"value": "C", "region": "stk"}},
-                {"op": "push_to", "target_ids": ["stack", "C"], "params": {}},
-            ],
-            "covered_concepts": ["pop", "lifo_order"],
-            "intent": "pop-push-done",
-        },
-    ]
-def _easy_2_mediocre() -> List[Dict[str, Any]]:
-    return [
-        {
-            "step_type": "advance",
-            "narration": "Performing stack operations.",
-            "ops": [
-                {"op": "add_region", "target_ids": ["stk"], "params": {"style": "stack", "title": "Stack"}},
-                {"op": "add_container", "target_ids": ["stack"], "params": {"region": "stk"}},
-                {"op": "add_node", "target_ids": ["A"], "params": {"value": "A", "region": "stk"}},
-                {"op": "push_to", "target_ids": ["stack", "A"], "params": {}},
-                {"op": "add_node", "target_ids": ["B"], "params": {"value": "B", "region": "stk"}},
-            ],
-            "covered_concepts": ["top_pointer", "push", "pop", "lifo_order"],
-            "intent": "",
-        },
-        {
-            "step_type": "complete",
-            "narration": "Stack is ready.",
-            "ops": [
-                {"op": "push_to", "target_ids": ["stack", "B"], "params": {}},
-            ],
-            "covered_concepts": [],
-            "intent": "",
-        },
-    ]
-def _easy_2_bad() -> List[Dict[str, Any]]:
-    return [
-        {
-            "step_type": "complete",
-            "narration": "Stack operations.",
-            "ops": [],
-            "covered_concepts": ["top_pointer", "push", "pop", "lifo_order"],
-            "intent": "",
-        },
-    ]
-# ---------------------------------------------------------------------------
-# medium_1: bfs_graph, graph A->B,C; B->D; C->D,E, source=A
-# ---------------------------------------------------------------------------
-def _medium_1_excellent() -> List[Dict[str, Any]]:
-    return [
-        {
-            "step_type": "advance",
-            "narration": "We draw the directed graph from the input — five nodes A through E with edges showing adjacency.",
-            "ops": [
-                {"op": "add_region", "target_ids": ["graph"], "params": {"style": "graph", "title": "Graph"}},
-                {"op": "add_node", "target_ids": ["A"], "params": {"value": "A", "region": "graph"}},
-                {"op": "add_node", "target_ids": ["B"], "params": {"value": "B", "region": "graph"}},
-                {"op": "add_node", "target_ids": ["C"], "params": {"value": "C", "region": "graph"}},
-                {"op": "add_node", "target_ids": ["D"], "params": {"value": "D", "region": "graph"}},
-                {"op": "add_node", "target_ids": ["E"], "params": {"value": "E", "region": "graph"}},
-                {"op": "add_edge", "target_ids": ["A", "B"], "params": {}},
-                {"op": "add_edge", "target_ids": ["A", "C"], "params": {}},
-                {"op": "add_edge", "target_ids": ["B", "D"], "params": {}},
-                {"op": "add_edge", "target_ids": ["C", "D"], "params": {}},
-                {"op": "add_edge", "target_ids": ["C", "E"], "params": {}},
-            ],
-            "covered_concepts": [],
-            "intent": "draw-graph",
-        },
-        {
-            "step_type": "advance",
-            "narration": "We initialize BFS by creating a queue and seeding it with source A — BFS always begins from a single starting node.",
-            "ops": [
-                {"op": "add_region", "target_ids": ["bfs_region"], "params": {"style": "queue", "title": "BFS Queue"}},
-                {"op": "add_container", "target_ids": ["q"], "params": {"region": "bfs_region", "ordered": True}},
-                {"op": "push_to", "target_ids": ["q", "A"], "params": {}},
-                {"op": "set_role", "target_ids": ["A"], "params": {"role": "frontier"}},
-            ],
-            "covered_concepts": ["queue", "frontier"],
-            "intent": "init-bfs",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Dequeue A from the queue and mark it visited — we then push A's unvisited neighbors B and C as frontier nodes.",
-            "ops": [
-                {"op": "pop_from", "target_ids": ["q"], "params": {}},
-                {"op": "set_role", "target_ids": ["A"], "params": {"role": "visited"}},
-                {"op": "push_to", "target_ids": ["q", "B"], "params": {}},
-                {"op": "push_to", "target_ids": ["q", "C"], "params": {}},
-                {"op": "set_role", "target_ids": ["B"], "params": {"role": "frontier"}},
-                {"op": "set_role", "target_ids": ["C"], "params": {"role": "frontier"}},
-            ],
-            "covered_concepts": ["visited_set", "dequeue"],
-            "intent": "visit-A",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Dequeue B and visit it — B's neighbor D joins the frontier, demonstrating BFS level-by-level order.",
-            "ops": [
-                {"op": "pop_from", "target_ids": ["q"], "params": {}},
-                {"op": "set_role", "target_ids": ["B"], "params": {"role": "visited"}},
-                {"op": "push_to", "target_ids": ["q", "D"], "params": {}},
-                {"op": "set_role", "target_ids": ["D"], "params": {"role": "frontier"}},
-            ],
-            "covered_concepts": ["level_order"],
-            "intent": "visit-B",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Dequeue C and visit it — C connects to D and E, but D is already in the queue so only E is added.",
-            "ops": [
-                {"op": "pop_from", "target_ids": ["q"], "params": {}},
-                {"op": "set_role", "target_ids": ["C"], "params": {"role": "visited"}},
-                {"op": "push_to", "target_ids": ["q", "E"], "params": {}},
-                {"op": "set_role", "target_ids": ["E"], "params": {"role": "frontier"}},
-            ],
-            "covered_concepts": [],
-            "intent": "visit-C",
-        },
-        {
-            "step_type": "complete",
-            "narration": "All nodes are now visited in BFS order A,B,C,D,E — each level was fully processed before moving deeper.",
-            "ops": [
-                {"op": "pop_from", "target_ids": ["q"], "params": {}},
-                {"op": "set_role", "target_ids": ["D"], "params": {"role": "visited"}},
-                {"op": "pop_from", "target_ids": ["q"], "params": {}},
-                {"op": "set_role", "target_ids": ["E"], "params": {"role": "visited"}},
-            ],
-            "covered_concepts": [],
-            "intent": "complete",
-        },
-    ]
-def _medium_1_good() -> List[Dict[str, Any]]:
-    return [
-        {
-            "step_type": "advance",
-            "narration": "Drawing the graph: nodes A through E with directed edges from the input data.",
-            "ops": [
-                {"op": "add_region", "target_ids": ["graph"], "params": {"style": "graph", "title": "Graph"}},
-                {"op": "add_node", "target_ids": ["A"], "params": {"value": "A", "region": "graph"}},
-                {"op": "add_node", "target_ids": ["B"], "params": {"value": "B", "region": "graph"}},
-                {"op": "add_node", "target_ids": ["C"], "params": {"value": "C", "region": "graph"}},
-                {"op": "add_node", "target_ids": ["D"], "params": {"value": "D", "region": "graph"}},
-                {"op": "add_node", "target_ids": ["E"], "params": {"value": "E", "region": "graph"}},
-                {"op": "add_edge", "target_ids": ["A", "B"], "params": {}},
-                {"op": "add_edge", "target_ids": ["A", "C"], "params": {}},
-                {"op": "add_edge", "target_ids": ["B", "D"], "params": {}},
-                {"op": "add_edge", "target_ids": ["C", "D"], "params": {}},
-                {"op": "add_edge", "target_ids": ["C", "E"], "params": {}},
-            ],
-            "covered_concepts": [],
-            "intent": "draw-graph",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Create a BFS queue and enqueue source node A, marking it as frontier.",
-            "ops": [
-                {"op": "add_region", "target_ids": ["bfs_region"], "params": {"style": "queue", "title": "BFS Queue"}},
-                {"op": "add_container", "target_ids": ["q"], "params": {"region": "bfs_region", "ordered": True}},
-                {"op": "push_to", "target_ids": ["q", "A"], "params": {}},
-                {"op": "set_role", "target_ids": ["A"], "params": {"role": "frontier"}},
-            ],
-            "covered_concepts": ["queue", "frontier"],
-            "intent": "init",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Dequeue A, visit it, and enqueue neighbors B and C.",
-            "ops": [
-                {"op": "pop_from", "target_ids": ["q"], "params": {}},
-                {"op": "set_role", "target_ids": ["A"], "params": {"role": "visited"}},
-                {"op": "push_to", "target_ids": ["q", "B"], "params": {}},
-                {"op": "push_to", "target_ids": ["q", "C"], "params": {}},
-            ],
-            "covered_concepts": ["visited_set", "dequeue"],
-            "intent": "visit-A",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Process B then C, adding D and E to the queue as we go.",
-            "ops": [
-                {"op": "pop_from", "target_ids": ["q"], "params": {}},
-                {"op": "set_role", "target_ids": ["B"], "params": {"role": "visited"}},
-                {"op": "push_to", "target_ids": ["q", "D"], "params": {}},
-                {"op": "pop_from", "target_ids": ["q"], "params": {}},
-                {"op": "set_role", "target_ids": ["C"], "params": {"role": "visited"}},
-                {"op": "push_to", "target_ids": ["q", "E"], "params": {}},
-            ],
-            "covered_concepts": ["level_order"],
-            "intent": "visit-BC",
-        },
-        {
-            "step_type": "complete",
-            "narration": "D and E are dequeued and visited, finishing BFS traversal.",
-            "ops": [
-                {"op": "pop_from", "target_ids": ["q"], "params": {}},
-                {"op": "set_role", "target_ids": ["D"], "params": {"role": "visited"}},
-                {"op": "pop_from", "target_ids": ["q"], "params": {}},
-                {"op": "set_role", "target_ids": ["E"], "params": {"role": "visited"}},
-            ],
-            "covered_concepts": [],
-            "intent": "done",
-        },
-    ]
-def _medium_1_mediocre() -> List[Dict[str, Any]]:
-    return [
-        {
-            "step_type": "advance",
-            "narration": "Drawing the graph nodes.",
-            "ops": [
-                {"op": "add_region", "target_ids": ["graph"], "params": {"style": "graph", "title": "Graph"}},
-                {"op": "add_node", "target_ids": ["A"], "params": {"value": "A", "region": "graph"}},
-                {"op": "add_node", "target_ids": ["B"], "params": {"value": "B", "region": "graph"}},
-                {"op": "add_node", "target_ids": ["C"], "params": {"value": "C", "region": "graph"}},
-                {"op": "add_node", "target_ids": ["D"], "params": {"value": "D", "region": "graph"}},
-                {"op": "add_node", "target_ids": ["E"], "params": {"value": "E", "region": "graph"}},
-                {"op": "add_edge", "target_ids": ["A", "B"], "params": {}},
-                {"op": "add_edge", "target_ids": ["A", "C"], "params": {}},
-                {"op": "add_edge", "target_ids": ["B", "D"], "params": {}},
-                {"op": "add_edge", "target_ids": ["C", "D"], "params": {}},
-                {"op": "add_edge", "target_ids": ["C", "E"], "params": {}},
-            ],
-            "covered_concepts": [],
-            "intent": "",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Setting up BFS.",
-            "ops": [
-                {"op": "add_region", "target_ids": ["bfs_region"], "params": {"style": "queue", "title": "BFS"}},
-                {"op": "add_container", "target_ids": ["q"], "params": {"region": "bfs_region", "ordered": True}},
-                {"op": "set_role", "target_ids": ["A"], "params": {"role": "visited"}},
-                {"op": "set_role", "target_ids": ["B"], "params": {"role": "visited"}},
-                {"op": "set_role", "target_ids": ["C"], "params": {"role": "visited"}},
-            ],
-            "covered_concepts": ["queue", "visited_set", "frontier", "level_order", "dequeue"],
-            "intent": "",
-        },
-        {
-            "step_type": "complete",
-            "narration": "BFS finished.",
-            "ops": [
-                {"op": "set_role", "target_ids": ["D"], "params": {"role": "visited"}},
-            ],
-            "covered_concepts": [],
-            "intent": "",
-        },
-    ]
-def _medium_1_bad() -> List[Dict[str, Any]]:
-    return [
-        {
-            "step_type": "advance",
-            "narration": "Processing the graph.",
-            "ops": [
-                {"op": "add_node", "target_ids": ["A"], "params": {"value": "A"}},
-                {"op": "add_node", "target_ids": ["B"], "params": {"value": "B"}},
-                {"op": "set_role", "target_ids": ["A"], "params": {"role": "visited"}},
-                {"op": "set_role", "target_ids": ["B"], "params": {"role": "visited"}},
-            ],
-            "covered_concepts": ["queue", "visited_set", "frontier", "level_order", "dequeue"],
-            "intent": "",
-        },
-        {
-            "step_type": "complete",
-            "narration": "Done.",
-            "ops": [],
-            "covered_concepts": [],
-            "intent": "",
-        },
-    ]
-# ---------------------------------------------------------------------------
-# hard_1: dijkstra_step,
-#   graph A->B(1),A->C(4),B->C(2),B->D(5),C->D(1), source=A
-# ---------------------------------------------------------------------------
-def _hard_1_excellent() -> List[Dict[str, Any]]:
-    return [
-        {
-            "step_type": "advance",
-            "narration": "We draw the weighted directed graph — four nodes A through D with edge weights from the input data.",
-            "ops": [
-                {"op": "add_region", "target_ids": ["graph"], "params": {"style": "graph", "title": "Weighted Graph"}},
-                {"op": "add_node", "target_ids": ["A"], "params": {"value": "A", "region": "graph"}},
-                {"op": "add_node", "target_ids": ["B"], "params": {"value": "B", "region": "graph"}},
-                {"op": "add_node", "target_ids": ["C"], "params": {"value": "C", "region": "graph"}},
-                {"op": "add_node", "target_ids": ["D"], "params": {"value": "D", "region": "graph"}},
-                {"op": "add_edge", "target_ids": ["A", "B"], "params": {"label": "1"}},
-                {"op": "add_edge", "target_ids": ["A", "C"], "params": {"label": "4"}},
-                {"op": "add_edge", "target_ids": ["B", "C"], "params": {"label": "2"}},
-                {"op": "add_edge", "target_ids": ["B", "D"], "params": {"label": "5"}},
-                {"op": "add_edge", "target_ids": ["C", "D"], "params": {"label": "1"}},
-            ],
-            "covered_concepts": [],
-            "intent": "draw-graph",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Initialize Dijkstra by creating a priority queue and distance table — source A gets distance 0, all others start at infinity.",
-            "ops": [
-                {"op": "add_region", "target_ids": ["pq_region"], "params": {"style": "queue", "title": "Priority Queue"}},
-                {"op": "add_container", "target_ids": ["pq"], "params": {"region": "pq_region", "ordered": True}},
-                {"op": "push_to", "target_ids": ["pq", "A"], "params": {}},
-                {"op": "annotate", "target_ids": ["A"], "params": {"text": "d=0"}},
-                {"op": "set_role", "target_ids": ["A"], "params": {"role": "current"}},
-            ],
-            "covered_concepts": ["priority_queue", "distance_table"],
-            "intent": "init",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Extract A (d=0) from the priority queue — Dijkstra guarantees this distance is final. Relax edges to B and C: d[B]=1, d[C]=4.",
-            "ops": [
-                {"op": "pop_from", "target_ids": ["pq"], "params": {}},
-                {"op": "set_role", "target_ids": ["A"], "params": {"role": "visited"}},
-                {"op": "set_value", "target_ids": ["B"], "params": {"value": 1}},
-                {"op": "annotate", "target_ids": ["B"], "params": {"text": "d=1"}},
-                {"op": "push_to", "target_ids": ["pq", "B"], "params": {}},
-                {"op": "set_value", "target_ids": ["C"], "params": {"value": 4}},
-            ],
-            "covered_concepts": ["relaxation", "shortest_path_invariant"],
-            "intent": "visit-A",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Extract B (d=1) — its shortest distance is now permanent. Relaxing B's edges: d[C] improves from 4 to 3 via B, and d[D]=6.",
-            "ops": [
-                {"op": "pop_from", "target_ids": ["pq"], "params": {}},
-                {"op": "set_role", "target_ids": ["B"], "params": {"role": "visited"}},
-                {"op": "annotate", "target_ids": ["B"], "params": {"text": "d=1 final"}},
-                {"op": "set_value", "target_ids": ["C"], "params": {"value": 3}},
-                {"op": "annotate", "target_ids": ["C"], "params": {"text": "d=3"}},
-                {"op": "push_to", "target_ids": ["pq", "C"], "params": {}},
-            ],
-            "covered_concepts": ["visited_set"],
-            "intent": "visit-B",
-        },
-        {
-            "step_type": "complete",
-            "narration": "Extract C (d=3), relax to D: d[D] improves to 4. Then D is extracted with final distance 4. All shortest paths found.",
-            "ops": [
-                {"op": "pop_from", "target_ids": ["pq"], "params": {}},
-                {"op": "set_role", "target_ids": ["C"], "params": {"role": "visited"}},
-                {"op": "set_value", "target_ids": ["D"], "params": {"value": 4}},
-                {"op": "annotate", "target_ids": ["D"], "params": {"text": "d=4"}},
-                {"op": "set_role", "target_ids": ["D"], "params": {"role": "done"}},
-            ],
-            "covered_concepts": [],
-            "intent": "complete",
-        },
-    ]
-def _hard_1_good() -> List[Dict[str, Any]]:
-    return [
-        {
-            "step_type": "advance",
-            "narration": "Drawing the weighted graph with nodes A, B, C, D and their edge weights.",
-            "ops": [
-                {"op": "add_region", "target_ids": ["graph"], "params": {"style": "graph", "title": "Graph"}},
-                {"op": "add_node", "target_ids": ["A"], "params": {"value": "A", "region": "graph"}},
-                {"op": "add_node", "target_ids": ["B"], "params": {"value": "B", "region": "graph"}},
-                {"op": "add_node", "target_ids": ["C"], "params": {"value": "C", "region": "graph"}},
-                {"op": "add_node", "target_ids": ["D"], "params": {"value": "D", "region": "graph"}},
-                {"op": "add_edge", "target_ids": ["A", "B"], "params": {"label": "1"}},
-                {"op": "add_edge", "target_ids": ["A", "C"], "params": {"label": "4"}},
-                {"op": "add_edge", "target_ids": ["B", "C"], "params": {"label": "2"}},
-                {"op": "add_edge", "target_ids": ["B", "D"], "params": {"label": "5"}},
-                {"op": "add_edge", "target_ids": ["C", "D"], "params": {"label": "1"}},
-            ],
-            "covered_concepts": [],
-            "intent": "draw-graph",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Set up a priority queue for Dijkstra and initialize source A with distance 0.",
-            "ops": [
-                {"op": "add_region", "target_ids": ["pq_region"], "params": {"style": "queue", "title": "PQ"}},
-                {"op": "add_container", "target_ids": ["pq"], "params": {"region": "pq_region", "ordered": True}},
-                {"op": "push_to", "target_ids": ["pq", "A"], "params": {}},
-                {"op": "annotate", "target_ids": ["A"], "params": {"text": "d=0"}},
-            ],
-            "covered_concepts": ["priority_queue", "distance_table"],
-            "intent": "init",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Extract A, relax edges to B (d=1) and C (d=4).",
-            "ops": [
-                {"op": "pop_from", "target_ids": ["pq"], "params": {}},
-                {"op": "set_role", "target_ids": ["A"], "params": {"role": "visited"}},
-                {"op": "set_value", "target_ids": ["B"], "params": {"value": 1}},
-                {"op": "push_to", "target_ids": ["pq", "B"], "params": {}},
-                {"op": "set_value", "target_ids": ["C"], "params": {"value": 4}},
-            ],
-            "covered_concepts": ["relaxation"],
-            "intent": "visit-A",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Extract B, update C's distance to 3 via B. Push C into the queue.",
-            "ops": [
-                {"op": "pop_from", "target_ids": ["pq"], "params": {}},
-                {"op": "set_role", "target_ids": ["B"], "params": {"role": "visited"}},
-                {"op": "set_value", "target_ids": ["C"], "params": {"value": 3}},
-                {"op": "push_to", "target_ids": ["pq", "C"], "params": {}},
-            ],
-            "covered_concepts": ["visited_set", "shortest_path_invariant"],
-            "intent": "visit-B",
-        },
-        {
-            "step_type": "complete",
-            "narration": "Extract C, relax D to distance 4. All shortest paths computed.",
-            "ops": [
-                {"op": "pop_from", "target_ids": ["pq"], "params": {}},
-                {"op": "set_role", "target_ids": ["C"], "params": {"role": "visited"}},
-                {"op": "set_value", "target_ids": ["D"], "params": {"value": 4}},
-                {"op": "set_role", "target_ids": ["D"], "params": {"role": "done"}},
-            ],
-            "covered_concepts": [],
-            "intent": "complete",
-        },
-    ]
-def _hard_1_mediocre() -> List[Dict[str, Any]]:
-    return [
-        {
-            "step_type": "advance",
-            "narration": "Drawing the graph for Dijkstra.",
-            "ops": [
-                {"op": "add_region", "target_ids": ["graph"], "params": {"style": "graph", "title": "Graph"}},
-                {"op": "add_node", "target_ids": ["A"], "params": {"value": "A", "region": "graph"}},
-                {"op": "add_node", "target_ids": ["B"], "params": {"value": "B", "region": "graph"}},
-                {"op": "add_node", "target_ids": ["C"], "params": {"value": "C", "region": "graph"}},
-                {"op": "add_node", "target_ids": ["D"], "params": {"value": "D", "region": "graph"}},
-                {"op": "add_edge", "target_ids": ["A", "B"], "params": {"label": "1"}},
-                {"op": "add_edge", "target_ids": ["A", "C"], "params": {"label": "4"}},
-                {"op": "add_edge", "target_ids": ["B", "C"], "params": {"label": "2"}},
-                {"op": "add_edge", "target_ids": ["B", "D"], "params": {"label": "5"}},
-                {"op": "add_edge", "target_ids": ["C", "D"], "params": {"label": "1"}},
-            ],
-            "covered_concepts": [],
-            "intent": "",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Running Dijkstra on the graph.",
-            "ops": [
-                {"op": "add_region", "target_ids": ["pq_region"], "params": {"style": "queue", "title": "PQ"}},
-                {"op": "add_container", "target_ids": ["pq"], "params": {"region": "pq_region", "ordered": True}},
-                {"op": "set_role", "target_ids": ["A"], "params": {"role": "visited"}},
-                {"op": "set_role", "target_ids": ["B"], "params": {"role": "visited"}},
-                {"op": "set_value", "target_ids": ["D"], "params": {"value": 4}},
-            ],
-            "covered_concepts": ["priority_queue", "distance_table", "relaxation", "visited_set", "shortest_path_invariant"],
-            "intent": "",
-        },
-        {
-            "step_type": "complete",
-            "narration": "Shortest paths found.",
-            "ops": [
-                {"op": "set_role", "target_ids": ["C"], "params": {"role": "visited"}},
-            ],
-            "covered_concepts": [],
-            "intent": "",
-        },
-    ]
-def _hard_1_bad() -> List[Dict[str, Any]]:
-    return [
-        {
-            "step_type": "advance",
-            "narration": "Starting Dijkstra.",
-            "ops": [
-                {"op": "add_node", "target_ids": ["A"], "params": {"value": "A"}},
-                {"op": "set_role", "target_ids": ["A"], "params": {"role": "done"}},
-            ],
-            "covered_concepts": ["priority_queue", "distance_table", "relaxation", "visited_set", "shortest_path_invariant"],
-            "intent": "",
-        },
-        {
-            "step_type": "complete",
-            "narration": "Dijkstra done.",
-            "ops": [],
-            "covered_concepts": [],
-            "intent": "",
-        },
-    ]
-# ---------------------------------------------------------------------------
-# hard_2: bst_insert, incremental, keys=[5,3,7,2,4,8]
-# ---------------------------------------------------------------------------
-def _hard_2_excellent() -> List[Dict[str, Any]]:
-    return [
-        {
-            "step_type": "advance",
-            "narration": "Create a tree region and insert 5 as the root node — the first key always becomes the root of the BST.",
-            "ops": [
-                {"op": "add_region", "target_ids": ["tree"], "params": {"style": "tree", "title": "BST", "root": "n5"}},
-                {"op": "add_node", "target_ids": ["n5"], "params": {"value": 5, "region": "tree"}},
-                {"op": "set_role", "target_ids": ["n5"], "params": {"role": "root"}},
-            ],
-            "covered_concepts": ["root_node"],
-            "intent": "insert-root",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Insert 3: comparing with root 5, 3 < 5 so it goes left — this maintains the BST invariant where left children are smaller.",
-            "ops": [
-                {"op": "add_node", "target_ids": ["n3"], "params": {"value": 3, "region": "tree"}},
-                {"op": "add_edge", "target_ids": ["n5", "n3"], "params": {"label": "L"}},
-                {"op": "set_role", "target_ids": ["n5"], "params": {"role": "comparing"}},
-            ],
-            "covered_concepts": ["bst_invariant", "left_subtree"],
-            "intent": "insert-3",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Insert 7: comparing with root 5, 7 > 5 so it goes right — the right subtree holds all keys greater than the parent.",
-            "ops": [
-                {"op": "add_node", "target_ids": ["n7"], "params": {"value": 7, "region": "tree"}},
-                {"op": "add_edge", "target_ids": ["n5", "n7"], "params": {"label": "R"}},
-                {"op": "set_role", "target_ids": ["n5"], "params": {"role": "root"}},
-            ],
-            "covered_concepts": ["right_subtree"],
-            "intent": "insert-7",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Insert 2 and 4: recursively comparing, 2 < 3 goes left of n3, and 4 > 3 goes right of n3.",
-            "ops": [
-                {"op": "add_node", "target_ids": ["n2"], "params": {"value": 2, "region": "tree"}},
-                {"op": "add_edge", "target_ids": ["n3", "n2"], "params": {"label": "L"}},
-                {"op": "add_node", "target_ids": ["n4"], "params": {"value": 4, "region": "tree"}},
-                {"op": "add_edge", "target_ids": ["n3", "n4"], "params": {"label": "R"}},
-            ],
-            "covered_concepts": ["recursive_insert"],
-            "intent": "insert-2-4",
-        },
-        {
-            "step_type": "complete",
-            "narration": "Insert 8: 8 > 5 go right, 8 > 7 go right — each insertion recursively finds the correct leaf position.",
-            "ops": [
-                {"op": "add_node", "target_ids": ["n8"], "params": {"value": 8, "region": "tree"}},
-                {"op": "add_edge", "target_ids": ["n7", "n8"], "params": {"label": "R"}},
-            ],
-            "covered_concepts": [],
-            "intent": "insert-8-complete",
-        },
-    ]
-def _hard_2_good() -> List[Dict[str, Any]]:
-    return [
-        {
-            "step_type": "advance",
-            "narration": "Create a BST tree region and insert 5 as root.",
-            "ops": [
-                {"op": "add_region", "target_ids": ["tree"], "params": {"style": "tree", "title": "BST", "root": "n5"}},
-                {"op": "add_node", "target_ids": ["n5"], "params": {"value": 5, "region": "tree"}},
-                {"op": "set_role", "target_ids": ["n5"], "params": {"role": "root"}},
-            ],
-            "covered_concepts": ["root_node"],
-            "intent": "root",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Insert 3 left of 5 and 7 right of 5, maintaining BST invariant.",
-            "ops": [
-                {"op": "add_node", "target_ids": ["n3"], "params": {"value": 3, "region": "tree"}},
-                {"op": "add_edge", "target_ids": ["n5", "n3"], "params": {"label": "L"}},
-                {"op": "add_node", "target_ids": ["n7"], "params": {"value": 7, "region": "tree"}},
-                {"op": "add_edge", "target_ids": ["n5", "n7"], "params": {"label": "R"}},
-            ],
-            "covered_concepts": ["bst_invariant", "left_subtree", "right_subtree"],
-            "intent": "insert-3-7",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Recursively insert 2 left of 3 and 4 right of 3.",
-            "ops": [
-                {"op": "add_node", "target_ids": ["n2"], "params": {"value": 2, "region": "tree"}},
-                {"op": "add_edge", "target_ids": ["n3", "n2"], "params": {"label": "L"}},
-                {"op": "add_node", "target_ids": ["n4"], "params": {"value": 4, "region": "tree"}},
-                {"op": "add_edge", "target_ids": ["n3", "n4"], "params": {"label": "R"}},
-            ],
-            "covered_concepts": ["recursive_insert"],
-            "intent": "insert-2-4",
-        },
-        {
-            "step_type": "complete",
-            "narration": "Insert 8 right of 7 to complete the BST.",
-            "ops": [
-                {"op": "add_node", "target_ids": ["n8"], "params": {"value": 8, "region": "tree"}},
-                {"op": "add_edge", "target_ids": ["n7", "n8"], "params": {"label": "R"}},
-            ],
-            "covered_concepts": [],
-            "intent": "done",
-        },
-    ]
-def _hard_2_mediocre() -> List[Dict[str, Any]]:
-    return [
-        {
-            "step_type": "advance",
-            "narration": "Building the BST.",
-            "ops": [
-                {"op": "add_region", "target_ids": ["tree"], "params": {"style": "tree", "title": "BST"}},
-                {"op": "add_node", "target_ids": ["n5"], "params": {"value": 5, "region": "tree"}},
-                {"op": "add_node", "target_ids": ["n3"], "params": {"value": 3, "region": "tree"}},
-                {"op": "add_edge", "target_ids": ["n5", "n3"], "params": {}},
-                {"op": "add_node", "target_ids": ["n7"], "params": {"value": 7, "region": "tree"}},
-                {"op": "add_edge", "target_ids": ["n5", "n7"], "params": {}},
-            ],
-            "covered_concepts": ["root_node", "bst_invariant", "left_subtree", "right_subtree", "recursive_insert"],
-            "intent": "",
-        },
-        {
-            "step_type": "complete",
-            "narration": "Tree built.",
-            "ops": [
-                {"op": "add_node", "target_ids": ["n2"], "params": {"value": 2, "region": "tree"}},
-                {"op": "add_edge", "target_ids": ["n3", "n2"], "params": {}},
-            ],
-            "covered_concepts": [],
-            "intent": "",
-        },
-    ]
-def _hard_2_bad() -> List[Dict[str, Any]]:
-    return [
-        {
-            "step_type": "advance",
-            "narration": "Inserting keys.",
-            "ops": [
-                {"op": "add_node", "target_ids": ["n5"], "params": {"value": 5}},
-            ],
-            "covered_concepts": ["root_node", "bst_invariant", "left_subtree", "right_subtree", "recursive_insert"],
-            "intent": "",
-        },
-        {
-            "step_type": "complete",
-            "narration": "BST done.",
-            "ops": [],
-            "covered_concepts": [],
-            "intent": "",
-        },
-    ]
-# ---------------------------------------------------------------------------
-# scenario registry
-# ---------------------------------------------------------------------------
-def get_all_scripted_scenarios() -> Dict[str, Dict[str, List[Dict[str, Any]]]]:
-    return {
-        "easy_1": {
-            "excellent": _easy_1_excellent(),
-            "good": _easy_1_good(),
-            "mediocre": _easy_1_mediocre(),
-            "bad": _easy_1_bad(),
-        },
-        "easy_2": {
-            "excellent": _easy_2_excellent(),
-            "good": _easy_2_good(),
-            "mediocre": _easy_2_mediocre(),
-            "bad": _easy_2_bad(),
-        },
-        "medium_1": {
-            "excellent": _medium_1_excellent(),
-            "good": _medium_1_good(),
-            "mediocre": _medium_1_mediocre(),
-            "bad": _medium_1_bad(),
-        },
-        "hard_1": {
-            "excellent": _hard_1_excellent(),
-            "good": _hard_1_good(),
-            "mediocre": _hard_1_mediocre(),
-            "bad": _hard_1_bad(),
-        },
-        "hard_2": {
-            "excellent": _hard_2_excellent(),
-            "good": _hard_2_good(),
-            "mediocre": _hard_2_mediocre(),
-            "bad": _hard_2_bad(),
-        },
-    }
-# ---------------------------------------------------------------------------
-# rollout runner
-# ---------------------------------------------------------------------------
-def run_scripted_rollout(
-    env: VisualReasoningEnvironment,
-    scenario_id: str,
-    actions: List[Dict[str, Any]],
-) -> List[Dict[str, Any]]:
-    obs = env.reset(scenario_id=scenario_id)
-    template = obs.task_name
-    steps: List[Dict[str, Any]] = []
-    prev_narration = ""
-    for i, action_dict in enumerate(actions):
-        action = VisualReasoningAction(**action_dict)
-        obs = env.step(action)
-        ops_parts: List[str] = []
-        for op in action_dict.get("ops") or []:
-            tids = ",".join(op.get("target_ids") or [])
-            ops_parts.append(f"{op['op']}[{tids}]")
-        ops_summary = ", ".join(ops_parts)
-        checklist = list(obs.concept_checklist)
-        covered = list(obs.concept_coverage)
-        steps.append({
-            "scenario_id": scenario_id,
-            "template": template,
-            "step_id": i + 1,
-            "narration": action_dict.get("narration", ""),
-            "ops_summary": ops_summary,
-            "prev_narration": prev_narration,
-            "covered_concepts": action_dict.get("covered_concepts", []),
-            "step_progress": f"step {i + 1}, {len(covered)}/{len(checklist)} concepts",
-            "score_breakdown": dict(obs.score_breakdown),
-            "reward": obs.reward,
-            "action": action_dict,
-        })
-        prev_narration = action_dict.get("narration", "")
-        if obs.done:
-            break
-    return steps
-# ---------------------------------------------------------------------------
-# labeling prompt builder
-# ---------------------------------------------------------------------------
-def build_labeling_prompt(step: Dict[str, Any]) -> str:
-    return LABELING_PROMPT.format(
-        template=step["template"],
-        ops_summary=step["ops_summary"] or "(none)",
-        prev_narration=step["prev_narration"] or "(none)",
-        narration=step["narration"],
-        covered_concepts=", ".join(step["covered_concepts"]) or "(none)",
-        step_progress=step["step_progress"],
-    )
-# ---------------------------------------------------------------------------
-# main
-# ---------------------------------------------------------------------------
-def main() -> None:
-    parser = argparse.ArgumentParser(description="Generate rubric training data")
-    parser.add_argument("--output-dir", default=str(Path(__file__).parent / "output"))
-    args = parser.parse_args()
-    os.makedirs(args.output_dir, exist_ok=True)
-    env = VisualReasoningEnvironment()
-    all_steps: List[Dict[str, Any]] = []
-    scenarios = get_all_scripted_scenarios()
-    for scenario_id, quality_map in scenarios.items():
-        for quality, actions in quality_map.items():
-            try:
-                steps = run_scripted_rollout(env, scenario_id, actions)
-            except Exception as exc:
-                print(f"WARN: {scenario_id}/{quality} failed: {exc}")
-                continue
-            for s in steps:
-                s["quality_level"] = quality
-            all_steps.extend(steps)
-    rollout_path = os.path.join(args.output_dir, "rollout_data.jsonl")
-    with open(rollout_path, "w", encoding="utf-8") as f:
-        for step in all_steps:
-            f.write(json.dumps(step, default=str) + "\n")
-    prompts_path = os.path.join(args.output_dir, "labeling_prompts.jsonl")
-    with open(prompts_path, "w", encoding="utf-8") as f:
-        for step in all_steps:
-            record = {
-                "scenario_id": step["scenario_id"],
-                "quality_level": step["quality_level"],
-                "step_id": step["step_id"],
-                "prompt": build_labeling_prompt(step),
-            }
-            f.write(json.dumps(record) + "\n")
-    quality_stats: Dict[str, Dict[str, Any]] = {}
-    for step in all_steps:
-        ql = step["quality_level"]
-        if ql not in quality_stats:
-            quality_stats[ql] = {"count": 0, "total_reward": 0.0, "scores": []}
-        quality_stats[ql]["count"] += 1
-        quality_stats[ql]["total_reward"] += step["reward"]
-        quality_stats[ql]["scores"].append(
-            step["score_breakdown"].get("overall_score", 0.0)
-        )
-    summary: Dict[str, Any] = {
-        "total_steps": len(all_steps),
-        "scenarios": len(scenarios),
-        "quality_levels": {},
-    }
-    for ql, stats in quality_stats.items():
-        scores = stats["scores"]
-        summary["quality_levels"][ql] = {
-            "step_count": stats["count"],
-            "avg_reward": round(stats["total_reward"] / max(1, stats["count"]), 4),
-            "avg_score": round(sum(scores) / max(1, len(scores)), 4),
-            "min_score": round(min(scores) if scores else 0.0, 4),
-            "max_score": round(max(scores) if scores else 0.0, 4),
-        }
-    summary_path = os.path.join(args.output_dir, "rubric_summary.json")
-    with open(summary_path, "w", encoding="utf-8") as f:
-        json.dump(summary, f, indent=2)
-    print(f"Generated {len(all_steps)} steps across {len(scenarios)} scenarios")
-    print(f"Output: {args.output_dir}/")
-    for ql in ("excellent", "good", "mediocre", "bad"):
-        if ql in summary["quality_levels"]:
-            info = summary["quality_levels"][ql]
-            print(
-                f"  {ql:>10}: {info['step_count']} steps, "
-                f"avg_score={info['avg_score']:.4f}, "
-                f"avg_reward={info['avg_reward']:.4f}"
-            )
-if __name__ == "__main__":
-    main()

server/app.py CHANGED Viewed

@@ -43,7 +43,10 @@ try:
     from .scenario_loader import load_scenarios
     from .scoring import weights_for_difficulty
     from .constants import (
-        ALLOWED_OPS, ROLE_VALUES, REGION_STYLES, NAMED_POSITIONS,
     )
 except ImportError:
     from models import VisualReasoningAction, VisualReasoningObservation
@@ -51,7 +54,10 @@ except ImportError:
     from server.scenario_loader import load_scenarios
     from server.scoring import weights_for_difficulty
     from server.constants import (
-        ALLOWED_OPS, ROLE_VALUES, REGION_STYLES, NAMED_POSITIONS,
     )
@@ -97,7 +103,12 @@ def _scenario_display_name(s: Dict[str, Any]) -> str:
 def _tier_emoji(t: str) -> str:
-    return {"easy": "\U0001f7e2", "medium": "\U0001f7e1", "hard": "\U0001f7e0", "expert": "\U0001f534"}.get(t, "⚪")
 def _tier_label(t: str) -> str:
@@ -108,6 +119,7 @@ def _tier_label(t: str) -> str:
 # Broadcaster (WebSocket fan-out for live viewer)
 # ---------------------------------------------------------------------------
 class Broadcaster:
     """Append-only message log with long-poll support."""
@@ -162,6 +174,7 @@ _broadcaster = Broadcaster()
 # TTS via fal-ai Kokoro
 # ---------------------------------------------------------------------------
 async def _tts_to_base64(text: str) -> Optional[str]:
     if not text or not text.strip():
         return None
@@ -169,6 +182,7 @@ async def _tts_to_base64(text: str) -> Optional[str]:
         return None
     try:
         import aiohttp
         async with aiohttp.ClientSession() as session:
             async with session.post(
                 "https://fal.run/fal-ai/kokoro",
@@ -181,14 +195,22 @@ async def _tts_to_base64(text: str) -> Optional[str]:
             ) as resp:
                 if resp.status != 200:
                     body = await resp.text()
-                    print(f"[TTS] fal-ai error {resp.status}: {body}", file=sys.stderr, flush=True)
                     return None
                 data = await resp.json()
             audio_url = data.get("audio", {}).get("url")
             if not audio_url:
-                print("[TTS] no audio URL in fal-ai response", file=sys.stderr, flush=True)
                 return None
-            async with session.get(audio_url, timeout=aiohttp.ClientTimeout(total=15)) as audio_resp:
                 audio_bytes = await audio_resp.read()
         result = base64.b64encode(audio_bytes).decode("ascii")
         print(f"[TTS] generated {len(audio_bytes)} bytes of audio", flush=True)
@@ -211,6 +233,7 @@ def _get_llm_client(api_key: str = ""):
     key = api_key or LLM_API_KEY
     if _openai_client is None or key != _openai_client_key:
         from openai import OpenAI
         _openai_client = OpenAI(base_url=LLM_API_BASE, api_key=key)
         _openai_client_key = key
     return _openai_client
@@ -219,6 +242,7 @@ def _get_llm_client(api_key: str = ""):
 def _build_system_prompt() -> str:
     try:
         from inference import SYSTEM_PROMPT
         return SYSTEM_PROMPT
     except Exception:
         return (
@@ -227,9 +251,12 @@ def _build_system_prompt() -> str:
         )
-def _build_user_prompt(obs: Any, last_action: Optional[Dict], last_reward: float, history: List[str]) -> str:
     try:
         from inference import build_user_prompt
         return build_user_prompt(obs, last_action, last_reward, history)
     except Exception:
         return f"Goal: {obs.goal}\nEntities: {list(obs.entities.keys())}\nRemaining steps: {obs.remaining_step_budget}"
@@ -245,10 +272,12 @@ def _strip_thinking_tokens(text: str) -> str:
 def _parse_llm_response(text: str) -> Optional[Dict[str, Any]]:
     try:
         from inference import parse_action, normalize_action
         parsed = parse_action(text)
         return normalize_action(parsed or {})
     except Exception:
         import re
         match = re.search(r"\{.*\}", text, flags=re.DOTALL)
         if match:
             try:
@@ -315,18 +344,20 @@ async def _run_demo(scenario_id: str, api_key: str = "", model_name: str = "") -
         goal_audio = await _tts_to_base64(obs.goal)
         snap = _obs_snapshot(obs)
-        await _broadcaster.send({
-            "type": "reset",
-            "task_name": obs.task_name,
-            "scenario_id": obs.scenario_id,
-            "goal": obs.goal,
-            "checklist": list(obs.concept_checklist),
-            "input_data": dict(obs.input_data),
-            "constraints": list(obs.constraints),
-            "max_steps": obs.max_steps,
-            "audio": goal_audio,
-            **snap,
-        })
         await asyncio.sleep(2.0)
@@ -349,7 +380,10 @@ async def _run_demo(scenario_id: str, api_key: str = "", model_name: str = "") -
                         temperature=LLM_TEMPERATURE,
                         max_tokens=LLM_MAX_TOKENS,
                         stream=False,
-                    ).choices[0].message.content or ""
                 )
                 text = _strip_thinking_tokens(text)
                 print(f"[DEMO] Step {step}: LLM returned {len(text)} chars", flush=True)
@@ -390,23 +424,25 @@ async def _run_demo(scenario_id: str, api_key: str = "", model_name: str = "") -
             history.append(f"Step {step}: {action_dict.get('narration', '')}")
             snap = _obs_snapshot(obs)
-            await _broadcaster.send({
-                "type": "step",
-                "task_name": obs_dict.get("task_name", ""),
-                "scenario_id": obs_dict.get("scenario_id", ""),
-                "step": step,
-                "step_type": action_dict.get("step_type"),
-                "intent": action_dict.get("intent", ""),
-                "narration": narration,
-                "ops": action_dict.get("ops", []),
-                "covered_concepts": action_dict.get("covered_concepts", []),
-                "reward": float(reward),
-                "score": float(overall),
-                "done": bool(done),
-                "error": error,
-                "audio": audio_b64,
-                **snap,
-            })
             if done:
                 await asyncio.sleep(2.0)
@@ -418,20 +454,23 @@ async def _run_demo(scenario_id: str, api_key: str = "", model_name: str = "") -
         print(f"[DEMO] error: {exc}", file=sys.stderr, flush=True)
         traceback.print_exc()
-    await _broadcaster.send({
-        "type": "end",
-        "task_name": scenario_id,
-        "success": score >= 0.65,
-        "steps": steps_taken,
-        "score": float(score),
-        "rewards": [float(r) for r in rewards],
-    })
 # ---------------------------------------------------------------------------
 # Scenario browser callbacks
 # ---------------------------------------------------------------------------
 def list_scenario_choices() -> List[str]:
     return [_scenario_display_name(s) for s in _get_scenarios()]
@@ -460,7 +499,9 @@ def show_scenario(choice: str) -> tuple:
             checklist_md = "\n".join(f"- `{c}`" for c in checklist)
-            constraints_md = "\n".join(f"- `{c}`" for c in constraints) if constraints else "_None_"
             return header, input_md, checklist_md, constraints_md
     return "Scenario not found.", "", "", ""
@@ -470,6 +511,7 @@ def show_scenario(choice: str) -> tuple:
 # Scoring explorer callback
 # ---------------------------------------------------------------------------
 def show_weights(difficulty: str) -> str:
     d = difficulty.lower()
     w = weights_for_difficulty(d)
@@ -486,7 +528,10 @@ def show_weights(difficulty: str) -> str:
 # Live demo callbacks (Gradio)
 # ---------------------------------------------------------------------------
-async def _start_live_demo(scenario_choice: str, hf_token: str = "", model_name: str = "") -> str:
     global _demo_task
     if not scenario_choice:
         return "Select a scenario first."
@@ -523,7 +568,10 @@ async def _stop_live_demo() -> str:
 # Gradio UI (custom builder for openenv's gradio_builder parameter)
 # ---------------------------------------------------------------------------
-def build_ui(web_manager, action_fields, metadata, is_chat_env, title, quick_start_md) -> gr.Blocks:
     """Custom Gradio UI builder for openenv's gradio_builder parameter."""
     with gr.Blocks(
         title="Visual Reasoning Environment",
@@ -537,7 +585,8 @@ def build_ui(web_manager, action_fields, metadata, is_chat_env, title, quick_sta
             # TAB 1: About  (blog-style, mirrors README)
             # ============================================================
             with gr.Tab("About"):
-                gr.Markdown("""
 ## The Problem Nobody Talks About
 Here's a question: *How do you teach a machine to teach?*
@@ -559,9 +608,11 @@ a visual explanation, step by step, where each step advances the algorithm AND a
 the learner's understanding.**
 That's the gap this project fills.
-                """)
-                gr.Markdown("""
 ## What This Is
 The Visual Reasoning Environment is an
@@ -575,7 +626,8 @@ correctness, concept coverage, narration quality, and teaching pedagogy.
 Think of it this way: you're not training the model to *know* BFS.
 You're training it to *teach* BFS the way the best professor you ever had would --
 with a marker in hand and an audience that needs to follow along.
-                """)
                 gr.HTML(
                     '<a href="https://youtu.be/KwWqjuyfWzw" target="_blank">'
@@ -584,7 +636,8 @@ with a marker in hand and an audience that needs to follow along.
                     'alt="Watch the Demo"/></a>'
                 )
-                gr.Markdown("""
 ## What the Agent Does
 Every scenario starts with an **empty canvas**. Nothing is drawn.
@@ -682,33 +735,32 @@ unsupported concept claims (0.30), too many ops (0.50), and info dumps (0.20).
 ## The Reinforcement Learning Loop
 ```
-+---------------------------------------------------------------------+
-|                     TRAINING LOOP (GRPO / RLVR)                     |
-|                                                                     |
-|  +-----------+    prompt     +--------------+    JSON action        |
-|  |           | ------------->|              | ------------------+    |
-|  |  Scenario |               |   LLM Agent  |                  |    |
-|  |  Generator|               |  (Teacher)   |                  |    |
-|  |           |    +--------->|              |<----------+      |    |
-|  +-----------+    |          +--------------+           |      |    |
-|                   |                                     |      |    |
-|             observation                              reward    |    |
-|           + score breakdown                         signal    |    |
-|                   |                                     |      |    |
-|          +--------+---------+      score          +----+---+  |    |
-|          |                  | <------------------- |        |  |    |
-|          |   Environment    |                     | Scoring |  |    |
-|          |  (Empty Canvas)  | ------------------> | Engine  |  |    |
-|          |                  |   canvas state      |(13 dim) |  |    |
-|          +------------------+                     +--------+  |    |
-|                   ^                                           |    |
-|                   |              step(action)                 |    |
-|                   +-------------------------------------------+    |
-|                                                                     |
-|  Per-step reward = delta(overall_score) + penalties + bonuses       |
-|  Episode: empty canvas --> Phase 1 (draw) --> Phase 2 (solve)      |
-|           --> Phase 3 (summarize) --> done                          |
-+---------------------------------------------------------------------+
 ```
 Every episode starts with a blank canvas and a goal like
@@ -762,19 +814,22 @@ the system without hand-coding rules for every field.
 "The goal is not to be impressive. The goal is to be clear."
 That's the north star of this project -- training machines not to be
 impressive explainers, but clear ones.*
-                """)
             # ============================================================
             # TAB 2: Live Demo
             # ============================================================
             with gr.Tab("Live Demo"):
-                gr.Markdown("""
 ## Watch an LLM Teach
 See the agent explain a CS algorithm in real-time -- canvas visualization
 with voice narration. Select a scenario, click **Start Demo**, then click the
 viewer area to activate audio.
-                """)
                 with gr.Row():
                     demo_hf_token = gr.Textbox(
@@ -800,7 +855,9 @@ viewer area to activate audio.
                     demo_start_btn = gr.Button("Start Demo", variant="primary", scale=1)
                     demo_stop_btn = gr.Button("Stop Demo", variant="stop", scale=1)
-                demo_status = gr.Markdown("_Enter your HF token, select a scenario, and click Start Demo._")
                 gr.HTML(
                     value=(
@@ -825,7 +882,8 @@ viewer area to activate audio.
             # TAB 3: Scoring & Architecture  (technical)
             # ============================================================
             with gr.Tab("Scoring & Architecture"):
-                gr.Markdown("""
 ## Scoring System
 The overall score is a **weighted sum of 13 sub-scores** (each 0-1) **minus 5 penalties**.
@@ -833,7 +891,8 @@ Weights are tuned per difficulty level -- harder tiers emphasize algorithm corre
 while easier tiers give more weight to narration quality and concept coverage.
 **Select a difficulty level** to see the weight distribution:
-                """)
                 difficulty_radio = gr.Radio(
                     choices=["easy", "medium", "hard", "expert"],
@@ -848,7 +907,8 @@ while easier tiers give more weight to narration quality and concept coverage.
                     outputs=[weights_display],
                 )
-                gr.Markdown("""
 ---
 ### Sub-Score Details
@@ -932,13 +992,15 @@ track membership for `push_to`/`pop_from`. Common source of LLM confusion.
 plus relative prefixes (`below:`, `right-of:`, ...) instead of numeric coordinates.
 This reduces the positioning search space from ~331K to ~6.5K, making it learnable within
 reasonable training budgets.
-                """)
             # ============================================================
             # TAB 4: API Reference  (technical)
             # ============================================================
             with gr.Tab("API Reference"):
-                gr.Markdown(f"""
 ## API Reference
 This Space exposes the standard **OpenEnv** HTTP + WebSocket API under `/api`.
@@ -1044,7 +1106,8 @@ export LOCAL_IMAGE_NAME=http://127.0.0.1:8000
 python inference.py                 # headless
 python inference_tldraw.py          # with tldraw browser viewer
 ```
-                """)
     return demo
@@ -1067,6 +1130,7 @@ app = create_app(
 # Additional routes for live viewer + audio
 # ---------------------------------------------------------------------------
 @app.get("/viewer")
 async def serve_viewer():
     viewer_path = VIEWER_DIR / "audio_viewer.html"
@@ -1114,7 +1178,9 @@ def main():
     print(f"  Live Viewer:    http://{args.host}:{args.port}/viewer")
     print(f"  OpenEnv API:    http://{args.host}:{args.port}/reset, /step, /health")
     if not LLM_API_KEY:
-        print("  NOTE: No HF_TOKEN/API_KEY env var — users can enter token in the Viewer tab")
     uvicorn.run(app, host=args.host, port=args.port)

     from .scenario_loader import load_scenarios
     from .scoring import weights_for_difficulty
     from .constants import (
+        ALLOWED_OPS,
+        ROLE_VALUES,
+        REGION_STYLES,
+        NAMED_POSITIONS,
     )
 except ImportError:
     from models import VisualReasoningAction, VisualReasoningObservation
     from server.scenario_loader import load_scenarios
     from server.scoring import weights_for_difficulty
     from server.constants import (
+        ALLOWED_OPS,
+        ROLE_VALUES,
+        REGION_STYLES,
+        NAMED_POSITIONS,
     )
 def _tier_emoji(t: str) -> str:
+    return {
+        "easy": "\U0001f7e2",
+        "medium": "\U0001f7e1",
+        "hard": "\U0001f7e0",
+        "expert": "\U0001f534",
+    }.get(t, "⚪")
 def _tier_label(t: str) -> str:
 # Broadcaster (WebSocket fan-out for live viewer)
 # ---------------------------------------------------------------------------
 class Broadcaster:
     """Append-only message log with long-poll support."""
 # TTS via fal-ai Kokoro
 # ---------------------------------------------------------------------------
 async def _tts_to_base64(text: str) -> Optional[str]:
     if not text or not text.strip():
         return None
         return None
     try:
         import aiohttp
         async with aiohttp.ClientSession() as session:
             async with session.post(
                 "https://fal.run/fal-ai/kokoro",
             ) as resp:
                 if resp.status != 200:
                     body = await resp.text()
+                    print(
+                        f"[TTS] fal-ai error {resp.status}: {body}",
+                        file=sys.stderr,
+                        flush=True,
+                    )
                     return None
                 data = await resp.json()
             audio_url = data.get("audio", {}).get("url")
             if not audio_url:
+                print(
+                    "[TTS] no audio URL in fal-ai response", file=sys.stderr, flush=True
+                )
                 return None
+            async with session.get(
+                audio_url, timeout=aiohttp.ClientTimeout(total=15)
+            ) as audio_resp:
                 audio_bytes = await audio_resp.read()
         result = base64.b64encode(audio_bytes).decode("ascii")
         print(f"[TTS] generated {len(audio_bytes)} bytes of audio", flush=True)
     key = api_key or LLM_API_KEY
     if _openai_client is None or key != _openai_client_key:
         from openai import OpenAI
         _openai_client = OpenAI(base_url=LLM_API_BASE, api_key=key)
         _openai_client_key = key
     return _openai_client
 def _build_system_prompt() -> str:
     try:
         from inference import SYSTEM_PROMPT
         return SYSTEM_PROMPT
     except Exception:
         return (
         )
+def _build_user_prompt(
+    obs: Any, last_action: Optional[Dict], last_reward: float, history: List[str]
+) -> str:
     try:
         from inference import build_user_prompt
         return build_user_prompt(obs, last_action, last_reward, history)
     except Exception:
         return f"Goal: {obs.goal}\nEntities: {list(obs.entities.keys())}\nRemaining steps: {obs.remaining_step_budget}"
 def _parse_llm_response(text: str) -> Optional[Dict[str, Any]]:
     try:
         from inference import parse_action, normalize_action
         parsed = parse_action(text)
         return normalize_action(parsed or {})
     except Exception:
         import re
         match = re.search(r"\{.*\}", text, flags=re.DOTALL)
         if match:
             try:
         goal_audio = await _tts_to_base64(obs.goal)
         snap = _obs_snapshot(obs)
+        await _broadcaster.send(
+            {
+                "type": "reset",
+                "task_name": obs.task_name,
+                "scenario_id": obs.scenario_id,
+                "goal": obs.goal,
+                "checklist": list(obs.concept_checklist),
+                "input_data": dict(obs.input_data),
+                "constraints": list(obs.constraints),
+                "max_steps": obs.max_steps,
+                "audio": goal_audio,
+                **snap,
+            }
+        )
         await asyncio.sleep(2.0)
                         temperature=LLM_TEMPERATURE,
                         max_tokens=LLM_MAX_TOKENS,
                         stream=False,
+                    )
+                    .choices[0]
+                    .message.content
+                    or "",
                 )
                 text = _strip_thinking_tokens(text)
                 print(f"[DEMO] Step {step}: LLM returned {len(text)} chars", flush=True)
             history.append(f"Step {step}: {action_dict.get('narration', '')}")
             snap = _obs_snapshot(obs)
+            await _broadcaster.send(
+                {
+                    "type": "step",
+                    "task_name": obs_dict.get("task_name", ""),
+                    "scenario_id": obs_dict.get("scenario_id", ""),
+                    "step": step,
+                    "step_type": action_dict.get("step_type"),
+                    "intent": action_dict.get("intent", ""),
+                    "narration": narration,
+                    "ops": action_dict.get("ops", []),
+                    "covered_concepts": action_dict.get("covered_concepts", []),
+                    "reward": float(reward),
+                    "score": float(overall),
+                    "done": bool(done),
+                    "error": error,
+                    "audio": audio_b64,
+                    **snap,
+                }
+            )
             if done:
                 await asyncio.sleep(2.0)
         print(f"[DEMO] error: {exc}", file=sys.stderr, flush=True)
         traceback.print_exc()
+    await _broadcaster.send(
+        {
+            "type": "end",
+            "task_name": scenario_id,
+            "success": score >= 0.65,
+            "steps": steps_taken,
+            "score": float(score),
+            "rewards": [float(r) for r in rewards],
+        }
+    )
 # ---------------------------------------------------------------------------
 # Scenario browser callbacks
 # ---------------------------------------------------------------------------
 def list_scenario_choices() -> List[str]:
     return [_scenario_display_name(s) for s in _get_scenarios()]
             checklist_md = "\n".join(f"- `{c}`" for c in checklist)
+            constraints_md = (
+                "\n".join(f"- `{c}`" for c in constraints) if constraints else "_None_"
+            )
             return header, input_md, checklist_md, constraints_md
     return "Scenario not found.", "", "", ""
 # Scoring explorer callback
 # ---------------------------------------------------------------------------
 def show_weights(difficulty: str) -> str:
     d = difficulty.lower()
     w = weights_for_difficulty(d)
 # Live demo callbacks (Gradio)
 # ---------------------------------------------------------------------------
+async def _start_live_demo(
+    scenario_choice: str, hf_token: str = "", model_name: str = ""
+) -> str:
     global _demo_task
     if not scenario_choice:
         return "Select a scenario first."
 # Gradio UI (custom builder for openenv's gradio_builder parameter)
 # ---------------------------------------------------------------------------
+def build_ui(
+    web_manager, action_fields, metadata, is_chat_env, title, quick_start_md
+) -> gr.Blocks:
     """Custom Gradio UI builder for openenv's gradio_builder parameter."""
     with gr.Blocks(
         title="Visual Reasoning Environment",
             # TAB 1: About  (blog-style, mirrors README)
             # ============================================================
             with gr.Tab("About"):
+                gr.Markdown(
+                    """
 ## The Problem Nobody Talks About
 Here's a question: *How do you teach a machine to teach?*
 the learner's understanding.**
 That's the gap this project fills.
+                """
+                )
+                gr.Markdown(
+                    """
 ## What This Is
 The Visual Reasoning Environment is an
 Think of it this way: you're not training the model to *know* BFS.
 You're training it to *teach* BFS the way the best professor you ever had would --
 with a marker in hand and an audience that needs to follow along.
+                """
+                )
                 gr.HTML(
                     '<a href="https://youtu.be/KwWqjuyfWzw" target="_blank">'
                     'alt="Watch the Demo"/></a>'
                 )
+                gr.Markdown(
+                    """
 ## What the Agent Does
 Every scenario starts with an **empty canvas**. Nothing is drawn.
 ## The Reinforcement Learning Loop
 ```
++-------------------------------------------------------------------+
+|                   TRAINING LOOP (GRPO / RLVR)                     |
+|                                                                   |
+|  +-----------+  prompt   +--------------+  JSON action            |
+|  |           | --------> |              | --------------+         |
+|  |  Scenario |           |   LLM Agent  |               |         |
+|  |  Generator|           |  (Teacher)   |               |         |
+|  |           |  +------> |              | <--------+    |         |
+|  +-----------+  |        +--------------+          |    |         |
+|                 |                                  |    |         |
+|           observation                           reward  |         |
+|         + score breakdown                       signal  |         |
+|                 |                                  |    |         |
+|        +--------+--------+    score        +------+--+  |         |
+|        |                 | <-------------- |         |  |         |
+|        |   Environment   |                 | Scoring |  |         |
+|        |  (Empty Canvas) | --------------> | Engine  |  |         |
+|        |                 |  canvas state   |(13 dim) |  |         |
+|        +-----------------+                 +---------+  |         |
+|                 ^                                       |         |
+|                 |          step(action)                 |         |
+|                 +--------------------------------------+          |
+|                                                                   |
+|  Per-step reward = delta(overall_score) + penalties + bonuses     |
+|  Episode: empty canvas --> Phase 1 --> Phase 2 --> Phase 3        |
++-------------------------------------------------------------------+
 ```
 Every episode starts with a blank canvas and a goal like
 "The goal is not to be impressive. The goal is to be clear."
 That's the north star of this project -- training machines not to be
 impressive explainers, but clear ones.*
+                """
+                )
             # ============================================================
             # TAB 2: Live Demo
             # ============================================================
             with gr.Tab("Live Demo"):
+                gr.Markdown(
+                    """
 ## Watch an LLM Teach
 See the agent explain a CS algorithm in real-time -- canvas visualization
 with voice narration. Select a scenario, click **Start Demo**, then click the
 viewer area to activate audio.
+                """
+                )
                 with gr.Row():
                     demo_hf_token = gr.Textbox(
                     demo_start_btn = gr.Button("Start Demo", variant="primary", scale=1)
                     demo_stop_btn = gr.Button("Stop Demo", variant="stop", scale=1)
+                demo_status = gr.Markdown(
+                    "_Enter your HF token, select a scenario, and click Start Demo._"
+                )
                 gr.HTML(
                     value=(
             # TAB 3: Scoring & Architecture  (technical)
             # ============================================================
             with gr.Tab("Scoring & Architecture"):
+                gr.Markdown(
+                    """
 ## Scoring System
 The overall score is a **weighted sum of 13 sub-scores** (each 0-1) **minus 5 penalties**.
 while easier tiers give more weight to narration quality and concept coverage.
 **Select a difficulty level** to see the weight distribution:
+                """
+                )
                 difficulty_radio = gr.Radio(
                     choices=["easy", "medium", "hard", "expert"],
                     outputs=[weights_display],
                 )
+                gr.Markdown(
+                    """
 ---
 ### Sub-Score Details
 plus relative prefixes (`below:`, `right-of:`, ...) instead of numeric coordinates.
 This reduces the positioning search space from ~331K to ~6.5K, making it learnable within
 reasonable training budgets.
+                """
+                )
             # ============================================================
             # TAB 4: API Reference  (technical)
             # ============================================================
             with gr.Tab("API Reference"):
+                gr.Markdown(
+                    f"""
 ## API Reference
 This Space exposes the standard **OpenEnv** HTTP + WebSocket API under `/api`.
 python inference.py                 # headless
 python inference_tldraw.py          # with tldraw browser viewer
 ```
+                """
+                )
     return demo
 # Additional routes for live viewer + audio
 # ---------------------------------------------------------------------------
 @app.get("/viewer")
 async def serve_viewer():
     viewer_path = VIEWER_DIR / "audio_viewer.html"
     print(f"  Live Viewer:    http://{args.host}:{args.port}/viewer")
     print(f"  OpenEnv API:    http://{args.host}:{args.port}/reset, /step, /health")
     if not LLM_API_KEY:
+        print(
+            "  NOTE: No HF_TOKEN/API_KEY env var — users can enter token in the Viewer tab"
+        )
     uvicorn.run(app, host=args.host, port=args.port)

server/app_backup.py DELETED Viewed

@@ -1,46 +0,0 @@
-# Copyright (c) Meta Platforms, Inc. and affiliates.
-# All rights reserved.
-#
-# This source code is licensed under the BSD-style license found in the
-# LICENSE file in the root directory of this source tree.
-"""FastAPI application for the Visual Reasoning Environment."""
-try:
-    from openenv.core.env_server.http_server import create_app
-except Exception as e:
-    raise ImportError(
-        "openenv is required for the web interface. Install dependencies with 'uv sync'."
-    ) from e
-try:
-    from ..models import VisualReasoningAction, VisualReasoningObservation
-    from .visual_reasoning_environment import VisualReasoningEnvironment
-except ImportError:
-    from models import VisualReasoningAction, VisualReasoningObservation
-    from server.visual_reasoning_environment import VisualReasoningEnvironment
-app = create_app(
-    VisualReasoningEnvironment,
-    VisualReasoningAction,
-    VisualReasoningObservation,
-    env_name="visual_reasoning",
-    max_concurrent_envs=1,
-)
-def main():
-    """Run the FastAPI server."""
-    import argparse
-    import uvicorn
-    parser = argparse.ArgumentParser()
-    parser.add_argument("--host", type=str, default="0.0.0.0")
-    parser.add_argument("--port", type=int, default=8000)
-    args = parser.parse_args()
-    uvicorn.run(app, host=args.host, port=args.port)
-if __name__ == "__main__":
-    main()

train.ipynb DELETED Viewed

@@ -1,913 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "id": "title",
-   "metadata": {},
-   "source": [
-    "# Visual Reasoning — Training Demo\n",
-    "\n",
-    "**Simulation-based RL for teaching CS algorithms on a whiteboard**\n",
-    "\n",
-    "This notebook trains a small LLM to be an expert visual explainer of CS algorithms.\n",
-    "The model draws data structures, walks through algorithms step-by-step, and narrates\n",
-    "the reasoning — scored by a 12-dimension reward system.\n",
-    "\n",
-    "| Stage | Purpose |\n",
-    "|-------|:--------|\n",
-    "| Baseline | Score the untrained model across all difficulties |\n",
-    "| SFT warmup | Teach JSON action format via gold demonstrations |\n",
-    "| GRPO | RL with dense environment rewards, easy → expert curriculum |\n",
-    "| Final eval | Delta report comparing all three checkpoints |\n",
-    "\n",
-    "- 17 scenarios (9 hand-crafted + 8 procedurally generated) across 4 difficulty levels\n",
-    "- 9 algorithm templates: linked list, stack, binary search, BFS, hash table, Dijkstra, BST, fib memo, quicksort\n",
-    "- 12 weighted sub-scores + 5 penalties → dense per-step reward\n",
-    "\n",
-    "> **No model saving** — everything stays in-memory. The same LoRA adapter flows from SFT into GRPO."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "install-hdr",
-   "metadata": {},
-   "source": [
-    "## 1. Install Dependencies"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "install",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "!pip install -q --upgrade \"torchvision>=0.25.0\"\n",
-    "!pip install -q huggingface_hub unsloth trl datasets transformers accelerate bitsandbytes peft torch\n",
-    "!pip install -q openenv-core fastapi uvicorn pydantic\n",
-    "!pip install -q python-dotenv networkx shapely sentence-transformers rapidfuzz textstat sortedcontainers \"numpy<2.0\""
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "clone-hdr",
-   "metadata": {},
-   "source": [
-    "## 2. Clone Visual Reasoning Environment"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "clone",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import os\n",
-    "\n",
-    "# Use heuristic narration scorer — no GPU contention with training model\n",
-    "os.environ[\"NARRATION_SCORER\"] = \"fallback\"\n",
-    "\n",
-    "from huggingface_hub import snapshot_download\n",
-    "\n",
-    "if os.path.basename(os.getcwd()) != \"visual_reasoning\":\n",
-    "    if not os.path.isdir(\"visual_reasoning\"):\n",
-    "        snapshot_download(\n",
-    "            repo_id=\"sreeramajay/visual_reasoning-env\",\n",
-    "            repo_type=\"space\",\n",
-    "            local_dir=\"visual_reasoning\",\n",
-    "            ignore_patterns=[\"*.gitattributes\", \".gitignore\", \"README.md\"],\n",
-    "        )\n",
-    "    os.chdir(\"visual_reasoning\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "verify-hdr",
-   "metadata": {},
-   "source": [
-    "## 3. Verify Environment — Run a Smoke Test"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "verify",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import sys, os, json, torch\n",
-    "sys.path.insert(0, '.')\n",
-    "\n",
-    "from unsloth import FastLanguageModel\n",
-    "\n",
-    "from models import VisualReasoningAction\n",
-    "from server.visual_reasoning_environment import VisualReasoningEnvironment\n",
-    "\n",
-    "test_env = VisualReasoningEnvironment()\n",
-    "obs = test_env.reset(scenario_id='easy_1')\n",
-    "print(f'Scenario: {obs.scenario_id}')\n",
-    "print(f'Goal: {obs.goal}')\n",
-    "print(f'Concepts: {obs.concept_checklist}')\n",
-    "print(f'Step budget: {obs.remaining_step_budget}')\n",
-    "\n",
-    "obs = test_env.step(VisualReasoningAction(\n",
-    "    step_type='advance',\n",
-    "    narration='Adding the first node with value 10.',\n",
-    "    ops=[{'op': 'add_node', 'target_ids': ['n0'], 'params': {'value': 10}}],\n",
-    "    covered_concepts=['node_value'],\n",
-    "    intent='test',\n",
-    "))\n",
-    "print(f'\\nAfter step: entities={list(obs.entities.keys())}, reward={obs.reward:.3f}, error={obs.action_error}')\n",
-    "print('Environment OK!')\n",
-    "del test_env"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "config-hdr",
-   "metadata": {},
-   "source": [
-    "## 4. Configuration"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "config",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from collections import Counter\n",
-    "from inference import SYSTEM_PROMPT, build_user_prompt, parse_action, normalize_action\n",
-    "\n",
-    "MODEL_NAME = 'unsloth/Qwen2.5-3B-Instruct-bnb-4bit'\n",
-    "MAX_SEQ_LENGTH = 4096\n",
-    "LORA_R = 16\n",
-    "LORA_ALPHA = 32\n",
-    "SFT_EPOCHS = 3\n",
-    "GRPO_EPOCHS = 2\n",
-    "\n",
-    "SCENARIOS = {\n",
-    "    'easy':   ['easy_1', 'easy_2', 'easy_3', 'gen_easy_1001', 'gen_easy_1002'],\n",
-    "    'medium': ['medium_1', 'medium_2', 'gen_medium_2001', 'gen_medium_2002'],\n",
-    "    'hard':   ['hard_1', 'hard_2', 'gen_hard_3001', 'gen_hard_3002'],\n",
-    "    'expert': ['expert_1', 'expert_2', 'gen_expert_4001', 'gen_expert_4002'],\n",
-    "}\n",
-    "DIFFICULTIES = ('easy', 'medium', 'hard', 'expert')"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "model-hdr",
-   "metadata": {},
-   "source": [
-    "## 5. Load Model (Unsloth + LoRA)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "load-model",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "model, tokenizer = FastLanguageModel.from_pretrained(\n",
-    "    model_name=MODEL_NAME,\n",
-    "    max_seq_length=MAX_SEQ_LENGTH,\n",
-    "    dtype=None,\n",
-    "    load_in_4bit=True,\n",
-    ")\n",
-    "model = FastLanguageModel.get_peft_model(\n",
-    "    model,\n",
-    "    r=LORA_R,\n",
-    "    lora_alpha=LORA_ALPHA,\n",
-    "    lora_dropout=0,\n",
-    "    target_modules=[\n",
-    "        'q_proj', 'k_proj', 'v_proj', 'o_proj',\n",
-    "        'gate_proj', 'up_proj', 'down_proj',\n",
-    "    ],\n",
-    "    bias='none',\n",
-    "    use_gradient_checkpointing='unsloth',\n",
-    "    random_state=0,\n",
-    ")\n",
-    "if tokenizer.pad_token_id is None:\n",
-    "    tokenizer.pad_token = tokenizer.eos_token\n",
-    "\n",
-    "model.generation_config.max_length = None\n",
-    "\n",
-    "trainable = sum(p.numel() for p in model.parameters() if p.requires_grad)\n",
-    "total = sum(p.numel() for p in model.parameters())\n",
-    "print(f'Parameters: {total / 1e6:.1f}M total, {trainable / 1e6:.2f}M trainable ({trainable / total:.2%})')"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "helpers-hdr",
-   "metadata": {},
-   "source": [
-    "## 6. Environment Wrapper & Batched Evaluation\n",
-    "\n",
-    "Uses the environment **directly in-process** — no server needed.\n",
-    "- **Batched generation**: collects prompts from up to 8 parallel episodes per `model.generate()` call\n",
-    "- **Early termination**: kills episodes after 3 consecutive no-ops\n",
-    "- **Environment pool**: reuses env instances across eval calls"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "helpers",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import time\n",
-    "\n",
-    "EVAL_BATCH_SIZE = 8\n",
-    "EVAL_MAX_STEPS = 24\n",
-    "NOOP_EARLY_STOP = 3\n",
-    "\n",
-    "\n",
-    "class EnvRunner:\n",
-    "    \"\"\"Thin wrapper over VisualReasoningEnvironment for a clean reset/step API.\"\"\"\n",
-    "\n",
-    "    def __init__(self):\n",
-    "        self.env = VisualReasoningEnvironment()\n",
-    "\n",
-    "    def reset(self, scenario_id=None, task_name=None):\n",
-    "        return self.env.reset(scenario_id=scenario_id, task_name=task_name)\n",
-    "\n",
-    "    def step(self, action_dict):\n",
-    "        act = VisualReasoningAction(**action_dict)\n",
-    "        obs = self.env.step(act)\n",
-    "        return obs, float(obs.reward), bool(obs.done)\n",
-    "\n",
-    "\n",
-    "FALLBACK_ACTION = {\n",
-    "    'step_type': 'complete', 'narration': 'Explanation complete.',\n",
-    "    'ops': [], 'covered_concepts': [], 'intent': 'finalize',\n",
-    "}\n",
-    "\n",
-    "_env_pool = []\n",
-    "\n",
-    "def _get_env(idx):\n",
-    "    while len(_env_pool) <= idx:\n",
-    "        _env_pool.append(VisualReasoningEnvironment())\n",
-    "    return _env_pool[idx]\n",
-    "\n",
-    "\n",
-    "class EpisodeState:\n",
-    "    \"\"\"Tracks one episode's state for batched eval.\"\"\"\n",
-    "    def __init__(self, env, scenario_id):\n",
-    "        self.env = env\n",
-    "        self.scenario_id = scenario_id\n",
-    "        self.obs = env.reset(scenario_id=scenario_id)\n",
-    "        self.last_action = None\n",
-    "        self.last_reward = 0.0\n",
-    "        self.history = []\n",
-    "        self.steps = 0\n",
-    "        self.done = False\n",
-    "        self.score = 0.0\n",
-    "        self.consecutive_noops = 0\n",
-    "\n",
-    "    def build_prompt_text(self):\n",
-    "        user_prompt = build_user_prompt(\n",
-    "            self.obs, self.last_action, self.last_reward, self.history\n",
-    "        )\n",
-    "        messages = [\n",
-    "            {'role': 'system', 'content': SYSTEM_PROMPT},\n",
-    "            {'role': 'user', 'content': user_prompt},\n",
-    "        ]\n",
-    "        return tokenizer.apply_chat_template(\n",
-    "            messages, tokenize=False, add_generation_prompt=True\n",
-    "        )\n",
-    "\n",
-    "    def apply_action(self, action_dict):\n",
-    "        if action_dict is None:\n",
-    "            action_dict = FALLBACK_ACTION\n",
-    "        self.obs = self.env.step(VisualReasoningAction(**action_dict))\n",
-    "        reward = float(self.obs.reward)\n",
-    "        self.last_action = action_dict\n",
-    "        self.last_reward = reward\n",
-    "        self.steps += 1\n",
-    "        self.history.append(f\"Step {self.steps}: {action_dict.get('narration', '')}\")\n",
-    "        if reward <= -0.04:\n",
-    "            self.consecutive_noops += 1\n",
-    "        else:\n",
-    "            self.consecutive_noops = 0\n",
-    "        if self.obs.done or self.consecutive_noops >= NOOP_EARLY_STOP:\n",
-    "            self.done = True\n",
-    "        self.score = float(self.obs.score_breakdown.get('overall_score', 0.0))\n",
-    "\n",
-    "\n",
-    "def batched_generate(model, tokenizer, prompt_texts, max_new_tokens=384):\n",
-    "    \"\"\"Run model.generate on a batch of prompt strings.\"\"\"\n",
-    "    FastLanguageModel.for_inference(model)\n",
-    "    orig_side = tokenizer.padding_side\n",
-    "    tokenizer.padding_side = 'left'\n",
-    "    inputs = tokenizer(\n",
-    "        prompt_texts,\n",
-    "        return_tensors='pt',\n",
-    "        padding=True,\n",
-    "        truncation=True,\n",
-    "        max_length=MAX_SEQ_LENGTH,\n",
-    "    ).to(model.device)\n",
-    "    tokenizer.padding_side = orig_side\n",
-    "    with torch.no_grad():\n",
-    "        outputs = model.generate(\n",
-    "            **inputs,\n",
-    "            max_new_tokens=max_new_tokens,\n",
-    "            do_sample=False,\n",
-    "            pad_token_id=tokenizer.pad_token_id or tokenizer.eos_token_id,\n",
-    "        )\n",
-    "    results = []\n",
-    "    for i, out in enumerate(outputs):\n",
-    "        input_len = inputs.input_ids[i].ne(tokenizer.pad_token_id).sum()\n",
-    "        text = tokenizer.decode(out[input_len:], skip_special_tokens=True)\n",
-    "        results.append(text)\n",
-    "    return results\n",
-    "\n",
-    "\n",
-    "def evaluate(model, tokenizer, label, stages=None):\n",
-    "    \"\"\"Batched evaluation across all scenarios.\"\"\"\n",
-    "    stages = stages or list(DIFFICULTIES)\n",
-    "    print(f'\\n{\"=\" * 60}')\n",
-    "    print(f'  {label}')\n",
-    "    print(f'{\"=\" * 60}')\n",
-    "\n",
-    "    all_scenario_ids = []\n",
-    "    for diff in stages:\n",
-    "        all_scenario_ids.extend([(diff, sid) for sid in SCENARIOS.get(diff, [])])\n",
-    "\n",
-    "    results_by_diff = {d: [] for d in stages}\n",
-    "    t0 = time.time()\n",
-    "\n",
-    "    for batch_start in range(0, len(all_scenario_ids), EVAL_BATCH_SIZE):\n",
-    "        batch = all_scenario_ids[batch_start:batch_start + EVAL_BATCH_SIZE]\n",
-    "        t_batch = time.time()\n",
-    "        episodes = []\n",
-    "        for i, (diff, sid) in enumerate(batch):\n",
-    "            episodes.append(EpisodeState(_get_env(i), sid))\n",
-    "\n",
-    "        for step_num in range(1, EVAL_MAX_STEPS + 1):\n",
-    "            active = [ep for ep in episodes if not ep.done]\n",
-    "            if not active:\n",
-    "                break\n",
-    "            prompts = [ep.build_prompt_text() for ep in active]\n",
-    "            generated = batched_generate(model, tokenizer, prompts)\n",
-    "            for ep, text in zip(active, generated):\n",
-    "                parsed = parse_action(text)\n",
-    "                action = normalize_action(parsed or {}) if parsed else None\n",
-    "                ep.apply_action(action)\n",
-    "\n",
-    "        batch_time = time.time() - t_batch\n",
-    "        for ep, (diff, sid) in zip(episodes, batch):\n",
-    "            results_by_diff[diff].append(ep.score)\n",
-    "            print(f'  [{diff:6}] {sid:22} score={ep.score:.3f} steps={ep.steps}')\n",
-    "        sids_str = ', '.join(sid for _, sid in batch)\n",
-    "        print(f'  Batch {batch_start // EVAL_BATCH_SIZE + 1}: '\n",
-    "              f'{len(batch)} scenarios in {batch_time:.1f}s  [{sids_str}]')\n",
-    "\n",
-    "    results = {}\n",
-    "    for diff in stages:\n",
-    "        scores = results_by_diff[diff]\n",
-    "        if scores:\n",
-    "            mean = sum(scores) / len(scores)\n",
-    "            results[diff] = mean\n",
-    "            print(f'  [{diff:6}] mean={mean:.3f}')\n",
-    "    overall = sum(results.values()) / max(len(results), 1)\n",
-    "    results['overall'] = overall\n",
-    "    elapsed = time.time() - t0\n",
-    "    print(f'  OVERALL: {overall:.3f}  ({elapsed:.1f}s)')\n",
-    "    return results\n",
-    "\n",
-    "\n",
-    "env = EnvRunner()\n",
-    "print('EnvRunner ready.')"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "baseline-hdr",
-   "metadata": {},
-   "source": [
-    "## 7. Baseline Evaluation\n",
-    "\n",
-    "Score the untrained model (random LoRA weights) across all scenarios to establish a baseline."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "baseline",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "baseline = evaluate(model, tokenizer, 'Baseline (untrained LoRA)')"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "gold-hdr",
-   "metadata": {},
-   "source": [
-    "## 8. Gold Demonstrations\n",
-    "\n",
-    "Hand-crafted teaching episodes for the three easy scenarios. These teach the model:\n",
-    "- The JSON action format (`step_type`, `narration`, `ops`, `covered_concepts`, `intent`)\n",
-    "- Incremental setup (Phase 1), algorithm walk-through (Phase 2), wrap-up (Phase 3)\n",
-    "- Proper concept evidencing (narration must mention the concept, ops must back it)\n",
-    "- Region vs container distinction (regions for layout, containers for push/pop)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "gold",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "GOLD_EPISODES = {\n",
-    "    # ── easy_1: linked_list_traversal ──────────────────────────────────\n",
-    "    # input_data: {\"values\": [10, 20, 30]}\n",
-    "    # concepts:   head_pointer, node_value, next_link, tail_marker\n",
-    "    'easy_1': [\n",
-    "        {\n",
-    "            'step_type': 'advance',\n",
-    "            'narration': 'Building a linked list — creating a centered layout region and the first two nodes with values 10 and 20.',\n",
-    "            'ops': [\n",
-    "                {'op': 'add_region', 'target_ids': ['list'], 'params': {'style': 'array', 'title': 'Linked List', 'position': 'center'}},\n",
-    "                {'op': 'add_node', 'target_ids': ['n1'], 'params': {'value': 10, 'region': 'list'}},\n",
-    "                {'op': 'add_node', 'target_ids': ['n2'], 'params': {'value': 20, 'region': 'list'}},\n",
-    "            ],\n",
-    "            'covered_concepts': ['node_value'],\n",
-    "            'intent': 'create_list_start',\n",
-    "        },\n",
-    "        {\n",
-    "            'step_type': 'advance',\n",
-    "            'narration': 'Adding node 30 and connecting all nodes with next links to form the chain 10 -> 20 -> 30.',\n",
-    "            'ops': [\n",
-    "                {'op': 'add_node', 'target_ids': ['n3'], 'params': {'value': 30, 'region': 'list'}},\n",
-    "                {'op': 'add_edge', 'target_ids': ['n1', 'n2'], 'params': {'kind': 'directed', 'label': 'next'}},\n",
-    "                {'op': 'add_edge', 'target_ids': ['n2', 'n3'], 'params': {'kind': 'directed', 'label': 'next'}},\n",
-    "            ],\n",
-    "            'covered_concepts': ['next_link'],\n",
-    "            'intent': 'connect_nodes',\n",
-    "        },\n",
-    "        {\n",
-    "            'step_type': 'advance',\n",
-    "            'narration': 'Placing a head pointer at node 10 because traversal always starts at the head, our only entry point into the list.',\n",
-    "            'ops': [\n",
-    "                {'op': 'add_pointer', 'target_ids': ['head_ptr'], 'params': {'region': 'list'}},\n",
-    "                {'op': 'move_pointer', 'target_ids': ['head_ptr'], 'params': {'index': 'n1'}},\n",
-    "                {'op': 'annotate', 'target_ids': ['n1'], 'params': {'text': 'Head'}},\n",
-    "                {'op': 'set_role', 'target_ids': ['n1'], 'params': {'role': 'current'}},\n",
-    "            ],\n",
-    "            'covered_concepts': ['head_pointer'],\n",
-    "            'intent': 'mark_head',\n",
-    "        },\n",
-    "        {\n",
-    "            'step_type': 'advance',\n",
-    "            'narration': 'Following the next link from 10 to 20 — the pointer advances and we mark node 10 as visited.',\n",
-    "            'ops': [\n",
-    "                {'op': 'set_role', 'target_ids': ['n1'], 'params': {'role': 'visited'}},\n",
-    "                {'op': 'set_role', 'target_ids': ['n2'], 'params': {'role': 'current'}},\n",
-    "                {'op': 'move_pointer', 'target_ids': ['head_ptr'], 'params': {'index': 'n2'}},\n",
-    "            ],\n",
-    "            'covered_concepts': [],\n",
-    "            'intent': 'traverse_to_second',\n",
-    "        },\n",
-    "        {\n",
-    "            'step_type': 'advance',\n",
-    "            'narration': 'Reaching node 30 — it has no next link, making it the tail that signals the end of traversal.',\n",
-    "            'ops': [\n",
-    "                {'op': 'set_role', 'target_ids': ['n2'], 'params': {'role': 'visited'}},\n",
-    "                {'op': 'set_role', 'target_ids': ['n3'], 'params': {'role': 'current'}},\n",
-    "                {'op': 'annotate', 'target_ids': ['n3'], 'params': {'text': 'Tail'}},\n",
-    "            ],\n",
-    "            'covered_concepts': ['tail_marker'],\n",
-    "            'intent': 'reach_tail',\n",
-    "        },\n",
-    "        {\n",
-    "            'step_type': 'complete',\n",
-    "            'narration': 'Traversal complete — visited every node from head to tail following next links, reading values 10, 20, 30 in order.',\n",
-    "            'ops': [\n",
-    "                {'op': 'set_role', 'target_ids': ['n3'], 'params': {'role': 'done'}},\n",
-    "            ],\n",
-    "            'covered_concepts': [],\n",
-    "            'intent': 'summarize',\n",
-    "        },\n",
-    "    ],\n",
-    "\n",
-    "    # ── easy_2: stack_ops ──────────────────────────────────────────────\n",
-    "    # input_data: {\"operations\": [\"push A\", \"push B\", \"pop\", \"push C\"]}\n",
-    "    # concepts:   top_pointer, push, pop, lifo_order\n",
-    "    'easy_2': [\n",
-    "        {\n",
-    "            'step_type': 'advance',\n",
-    "            'narration': 'Setting up a stack with a centered visual region and a container to track push and pop membership.',\n",
-    "            'ops': [\n",
-    "                {'op': 'add_region', 'target_ids': ['stack_area'], 'params': {'style': 'stack', 'title': 'Stack', 'position': 'center'}},\n",
-    "                {'op': 'add_container', 'target_ids': ['stk'], 'params': {'region': 'stack_area', 'ordered': False, 'title': 'Stack'}},\n",
-    "            ],\n",
-    "            'covered_concepts': [],\n",
-    "            'intent': 'setup_stack',\n",
-    "        },\n",
-    "        {\n",
-    "            'step_type': 'advance',\n",
-    "            'narration': 'Pushing A onto the stack — A becomes the first element. Adding a top pointer to track the stack top.',\n",
-    "            'ops': [\n",
-    "                {'op': 'add_node', 'target_ids': ['a'], 'params': {'value': 'A', 'region': 'stack_area'}},\n",
-    "                {'op': 'push_to', 'target_ids': ['stk', 'a'], 'params': {}},\n",
-    "                {'op': 'add_pointer', 'target_ids': ['top'], 'params': {'region': 'stack_area'}},\n",
-    "                {'op': 'move_pointer', 'target_ids': ['top'], 'params': {'index': 'a'}},\n",
-    "            ],\n",
-    "            'covered_concepts': ['push', 'top_pointer'],\n",
-    "            'intent': 'push_a',\n",
-    "        },\n",
-    "        {\n",
-    "            'step_type': 'advance',\n",
-    "            'narration': 'Pushing B — B sits on top of A and the top pointer moves up to B.',\n",
-    "            'ops': [\n",
-    "                {'op': 'add_node', 'target_ids': ['b'], 'params': {'value': 'B', 'region': 'stack_area'}},\n",
-    "                {'op': 'push_to', 'target_ids': ['stk', 'b'], 'params': {}},\n",
-    "                {'op': 'move_pointer', 'target_ids': ['top'], 'params': {'index': 'b'}},\n",
-    "            ],\n",
-    "            'covered_concepts': [],\n",
-    "            'intent': 'push_b',\n",
-    "        },\n",
-    "        {\n",
-    "            'step_type': 'advance',\n",
-    "            'narration': 'Popping from the stack — B was pushed last so B comes off first, demonstrating LIFO (last-in-first-out) order.',\n",
-    "            'ops': [\n",
-    "                {'op': 'pop_from', 'target_ids': ['stk'], 'params': {}},\n",
-    "                {'op': 'set_role', 'target_ids': ['b'], 'params': {'role': 'inactive'}},\n",
-    "                {'op': 'move_pointer', 'target_ids': ['top'], 'params': {'index': 'a'}},\n",
-    "            ],\n",
-    "            'covered_concepts': ['pop', 'lifo_order'],\n",
-    "            'intent': 'pop_b',\n",
-    "        },\n",
-    "        {\n",
-    "            'step_type': 'advance',\n",
-    "            'narration': 'Pushing C onto the stack — C now sits on top of A, with B already removed.',\n",
-    "            'ops': [\n",
-    "                {'op': 'add_node', 'target_ids': ['c'], 'params': {'value': 'C', 'region': 'stack_area'}},\n",
-    "                {'op': 'push_to', 'target_ids': ['stk', 'c'], 'params': {}},\n",
-    "                {'op': 'move_pointer', 'target_ids': ['top'], 'params': {'index': 'c'}},\n",
-    "            ],\n",
-    "            'covered_concepts': [],\n",
-    "            'intent': 'push_c',\n",
-    "        },\n",
-    "        {\n",
-    "            'step_type': 'complete',\n",
-    "            'narration': 'All four operations executed — stack holds A at bottom and C on top after push A, push B, pop, push C.',\n",
-    "            'ops': [],\n",
-    "            'covered_concepts': [],\n",
-    "            'intent': 'summarize',\n",
-    "        },\n",
-    "    ],\n",
-    "\n",
-    "    # ── easy_3: binary_search ──────────────────────────────────────────\n",
-    "    # input_data: {\"array\": [1, 3, 5, 7, 9, 11, 13], \"target\": 7}\n",
-    "    # concepts:   sorted_invariant, low_pointer, high_pointer, mid_pointer, comparison\n",
-    "    'easy_3': [\n",
-    "        {\n",
-    "            'step_type': 'advance',\n",
-    "            'narration': 'Creating the first four elements of a sorted array in a centered region — the sorted invariant is what makes binary search possible.',\n",
-    "            'ops': [\n",
-    "                {'op': 'add_region', 'target_ids': ['arr'], 'params': {'style': 'array', 'title': 'Sorted Array', 'position': 'center'}},\n",
-    "                {'op': 'add_node', 'target_ids': ['n0'], 'params': {'value': 1, 'region': 'arr'}},\n",
-    "                {'op': 'add_node', 'target_ids': ['n1'], 'params': {'value': 3, 'region': 'arr'}},\n",
-    "                {'op': 'add_node', 'target_ids': ['n2'], 'params': {'value': 5, 'region': 'arr'}},\n",
-    "            ],\n",
-    "            'covered_concepts': ['sorted_invariant'],\n",
-    "            'intent': 'create_array_part1',\n",
-    "        },\n",
-    "        {\n",
-    "            'step_type': 'advance',\n",
-    "            'narration': 'Adding the remaining elements 7, 9, 11, 13 to complete all seven values of the sorted array.',\n",
-    "            'ops': [\n",
-    "                {'op': 'add_node', 'target_ids': ['n3'], 'params': {'value': 7, 'region': 'arr'}},\n",
-    "                {'op': 'add_node', 'target_ids': ['n4'], 'params': {'value': 9, 'region': 'arr'}},\n",
-    "                {'op': 'add_node', 'target_ids': ['n5'], 'params': {'value': 11, 'region': 'arr'}},\n",
-    "                {'op': 'add_node', 'target_ids': ['n6'], 'params': {'value': 13, 'region': 'arr'}},\n",
-    "            ],\n",
-    "            'covered_concepts': [],\n",
-    "            'intent': 'create_array_part2',\n",
-    "        },\n",
-    "        {\n",
-    "            'step_type': 'advance',\n",
-    "            'narration': 'Placing low pointer at index 0 (value 1) and high pointer at index 6 (value 13) to bracket the search range.',\n",
-    "            'ops': [\n",
-    "                {'op': 'add_pointer', 'target_ids': ['low'], 'params': {'region': 'arr'}},\n",
-    "                {'op': 'move_pointer', 'target_ids': ['low'], 'params': {'index': 'n0'}},\n",
-    "                {'op': 'add_pointer', 'target_ids': ['high'], 'params': {'region': 'arr'}},\n",
-    "                {'op': 'move_pointer', 'target_ids': ['high'], 'params': {'index': 'n6'}},\n",
-    "            ],\n",
-    "            'covered_concepts': ['low_pointer', 'high_pointer'],\n",
-    "            'intent': 'init_pointers',\n",
-    "        },\n",
-    "        {\n",
-    "            'step_type': 'advance',\n",
-    "            'narration': 'Computing mid = (0+6)/2 = 3 — the mid pointer lands on value 7, which we compare against our target 7.',\n",
-    "            'ops': [\n",
-    "                {'op': 'add_pointer', 'target_ids': ['mid'], 'params': {'region': 'arr'}},\n",
-    "                {'op': 'move_pointer', 'target_ids': ['mid'], 'params': {'index': 'n3'}},\n",
-    "                {'op': 'set_role', 'target_ids': ['n3'], 'params': {'role': 'current'}},\n",
-    "                {'op': 'highlight', 'target_ids': ['n3'], 'params': {}},\n",
-    "            ],\n",
-    "            'covered_concepts': ['mid_pointer', 'comparison'],\n",
-    "            'intent': 'compute_mid_and_compare',\n",
-    "        },\n",
-    "        {\n",
-    "            'step_type': 'complete',\n",
-    "            'narration': 'Target 7 found at index 3 — binary search located it in one comparison because the sorted invariant halves the search space each step.',\n",
-    "            'ops': [\n",
-    "                {'op': 'set_role', 'target_ids': ['n3'], 'params': {'role': 'done'}},\n",
-    "                {'op': 'annotate', 'target_ids': ['n3'], 'params': {'text': 'Found: 7'}},\n",
-    "            ],\n",
-    "            'covered_concepts': [],\n",
-    "            'intent': 'found_target',\n",
-    "        },\n",
-    "    ],\n",
-    "}\n",
-    "\n",
-    "print(f'Gold episodes: {len(GOLD_EPISODES)} scenarios, {sum(len(v) for v in GOLD_EPISODES.values())} total steps')"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "sft-hdr",
-   "metadata": {},
-   "source": [
-    "## 9. SFT Warmup\n",
-    "\n",
-    "Replay gold demonstrations through the live environment to collect accurate\n",
-    "(observation, action) pairs at each step, then train the model to imitate them."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "sft",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from datasets import Dataset\n",
-    "from trl import SFTConfig, SFTTrainer\n",
-    "\n",
-    "def generate_sft_data(env, gold_episodes, tokenizer):\n",
-    "    \"\"\"Replay gold episodes through the env, collecting chat-formatted training data.\"\"\"\n",
-    "    rows = []\n",
-    "    for scenario_id, actions in gold_episodes.items():\n",
-    "        obs = env.reset(scenario_id=scenario_id)\n",
-    "        last_action, last_reward, history = None, 0.0, []\n",
-    "        for i, action in enumerate(actions):\n",
-    "            user_prompt = build_user_prompt(obs, last_action, last_reward, history)\n",
-    "            messages = [\n",
-    "                {'role': 'system', 'content': SYSTEM_PROMPT},\n",
-    "                {'role': 'user', 'content': user_prompt},\n",
-    "                {'role': 'assistant', 'content': json.dumps(action, separators=(',', ':'))},\n",
-    "            ]\n",
-    "            text = tokenizer.apply_chat_template(\n",
-    "                messages, tokenize=False, add_generation_prompt=False\n",
-    "            )\n",
-    "            rows.append({'text': text})\n",
-    "            obs, reward, done = env.step(action)\n",
-    "            last_action, last_reward = action, reward\n",
-    "            history.append(f'Step {i + 1}: {action.get(\"narration\", \"\")}')\n",
-    "            if done:\n",
-    "                break\n",
-    "    return Dataset.from_list(rows)\n",
-    "\n",
-    "\n",
-    "sft_data = generate_sft_data(env, GOLD_EPISODES, tokenizer)\n",
-    "print(f'SFT training examples: {len(sft_data)}')\n",
-    "\n",
-    "FastLanguageModel.for_training(model)\n",
-    "\n",
-    "sft_config = SFTConfig(\n",
-    "    output_dir='/tmp/vr_sft_scratch',\n",
-    "    num_train_epochs=SFT_EPOCHS,\n",
-    "    per_device_train_batch_size=2,\n",
-    "    gradient_accumulation_steps=4,\n",
-    "    learning_rate=2e-4,\n",
-    "    lr_scheduler_type='cosine',\n",
-    "    warmup_ratio=0.03,\n",
-    "    logging_steps=5,\n",
-    "    save_strategy='no',\n",
-    "    fp16=True,\n",
-    "    max_seq_length=MAX_SEQ_LENGTH,\n",
-    "    dataset_text_field='text',\n",
-    "    optim='adamw_8bit',\n",
-    "    report_to='none',\n",
-    ")\n",
-    "\n",
-    "sft_trainer = SFTTrainer(\n",
-    "    model=model,\n",
-    "    processing_class=tokenizer,\n",
-    "    args=sft_config,\n",
-    "    train_dataset=sft_data,\n",
-    ")\n",
-    "\n",
-    "print('\\nTraining SFT...')\n",
-    "sft_trainer.train()\n",
-    "print('SFT complete!')"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "post-sft-hdr",
-   "metadata": {},
-   "source": [
-    "## 10. Post-SFT Evaluation"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "post-sft-eval",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "sft_results = evaluate(model, tokenizer, 'Post-SFT')"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "grpo-hdr",
-   "metadata": {},
-   "source": [
-    "## 11. GRPO Training\n",
-    "\n",
-    "Generate prompts from all scenarios (initial observation states) and train with\n",
-    "environment reward signal. Curriculum ordering: easy prompts first, expert last."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "grpo",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from torch.utils.data import SequentialSampler\n",
-    "from trl import GRPOConfig, GRPOTrainer\n",
-    "\n",
-    "def generate_grpo_prompts(env, stages, samples_per_scenario=2):\n",
-    "    \"\"\"Collect initial-state prompts for GRPO training.\"\"\"\n",
-    "    rows = []\n",
-    "    for stage in stages:\n",
-    "        for sid in SCENARIOS[stage]:\n",
-    "            for _ in range(samples_per_scenario):\n",
-    "                obs = env.reset(scenario_id=sid)\n",
-    "                user_prompt = build_user_prompt(obs, None, 0.0, [])\n",
-    "                messages = [\n",
-    "                    {'role': 'system', 'content': SYSTEM_PROMPT},\n",
-    "                    {'role': 'user', 'content': user_prompt},\n",
-    "                ]\n",
-    "                rows.append({'prompt': messages, 'scenario_id': sid})\n",
-    "    return Dataset.from_list(rows)\n",
-    "\n",
-    "\n",
-    "def make_reward_fn(env):\n",
-    "    \"\"\"Reward function: parse model completion, step in env, return overall_score.\"\"\"\n",
-    "    state = {'calls': 0, 'hist': Counter()}\n",
-    "\n",
-    "    def reward_fn(completions, scenario_id=None, **_):\n",
-    "        texts = []\n",
-    "        for c in completions:\n",
-    "            if isinstance(c, list):\n",
-    "                texts.append(c[-1].get('content', '') if c else '')\n",
-    "            else:\n",
-    "                texts.append(str(c))\n",
-    "\n",
-    "        sids = scenario_id if isinstance(scenario_id, list) else [scenario_id] * len(texts)\n",
-    "        if len(sids) < len(texts):\n",
-    "            n_gen = len(texts) // len(sids)\n",
-    "            sids = [s for s in sids for _ in range(n_gen)]\n",
-    "\n",
-    "        rewards = []\n",
-    "        for sid, text in zip(sids, texts):\n",
-    "            obs = env.reset(scenario_id=sid)\n",
-    "            action = normalize_action(parse_action(text) or {})\n",
-    "            if action is None:\n",
-    "                rewards.append(0.0)\n",
-    "                state['hist']['<unparseable>'] += 1\n",
-    "                continue\n",
-    "            obs, _, _ = env.step(action)\n",
-    "            score = float(obs.score_breakdown.get('overall_score', 0.0))\n",
-    "            rewards.append(score)\n",
-    "            state['hist'][action.get('step_type', '?')] += 1\n",
-    "\n",
-    "        state['calls'] += 1\n",
-    "        if state['calls'] % 5 == 0:\n",
-    "            print(f\"  [reward] call={state['calls']} types={dict(state['hist'])}\")\n",
-    "        return rewards\n",
-    "\n",
-    "    return reward_fn\n",
-    "\n",
-    "\n",
-    "grpo_data = generate_grpo_prompts(env, list(DIFFICULTIES))\n",
-    "print(f'GRPO training prompts: {len(grpo_data)}')\n",
-    "\n",
-    "FastLanguageModel.for_training(model)\n",
-    "\n",
-    "for name, param in model.named_parameters():\n",
-    "    if 'lora_' in name and param.dtype == torch.float32:\n",
-    "        param.data = param.data.to(torch.bfloat16)\n",
-    "\n",
-    "grpo_config = GRPOConfig(\n",
-    "    output_dir='/tmp/vr_grpo_scratch',\n",
-    "    num_train_epochs=GRPO_EPOCHS,\n",
-    "    per_device_train_batch_size=2,\n",
-    "    gradient_accumulation_steps=4,\n",
-    "    num_generations=4,\n",
-    "    max_completion_length=384,\n",
-    "    learning_rate=1e-5,\n",
-    "    lr_scheduler_type='cosine',\n",
-    "    warmup_ratio=0.1,\n",
-    "    beta=0.05,\n",
-    "    max_grad_norm=0.5,\n",
-    "    temperature=0.9,\n",
-    "    logging_steps=1,\n",
-    "    save_strategy='no',\n",
-    "    fp16=False,\n",
-    "    bf16=True,\n",
-    "    optim='adamw_8bit',\n",
-    "    report_to='none',\n",
-    "    remove_unused_columns=False,\n",
-    ")\n",
-    "\n",
-    "\n",
-    "class CurriculumGRPOTrainer(GRPOTrainer):\n",
-    "    \"\"\"Preserve easy -> expert ordering by disabling dataset shuffle.\"\"\"\n",
-    "    def _get_train_sampler(self, *_args, **_kwargs):\n",
-    "        return SequentialSampler(self.train_dataset)\n",
-    "\n",
-    "\n",
-    "grpo_trainer = CurriculumGRPOTrainer(\n",
-    "    model=model,\n",
-    "    tokenizer=tokenizer,\n",
-    "    args=grpo_config,\n",
-    "    train_dataset=grpo_data,\n",
-    "    reward_funcs=make_reward_fn(env),\n",
-    ")\n",
-    "\n",
-    "print('Training GRPO...')\n",
-    "grpo_trainer.train()\n",
-    "print('GRPO complete!')"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "final-hdr",
-   "metadata": {},
-   "source": [
-    "## 12. Final Evaluation + Delta Report"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "final",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "final_results = evaluate(model, tokenizer, 'Final (SFT + GRPO)')\n",
-    "\n",
-    "print(f'\\n{\"=\" * 60}')\n",
-    "print('  DELTA REPORT')\n",
-    "print(f'{\"=\" * 60}')\n",
-    "print(f'  {\"Difficulty\":<12} {\"Baseline\":>10} {\"SFT\":>10} {\"SFT+GRPO\":>10}')\n",
-    "print(f'  {\"-\" * 12} {\"-\" * 10} {\"-\" * 10} {\"-\" * 10}')\n",
-    "for diff in list(DIFFICULTIES) + ['overall']:\n",
-    "    b = baseline.get(diff, 0.0)\n",
-    "    s = sft_results.get(diff, 0.0)\n",
-    "    f = final_results.get(diff, 0.0)\n",
-    "    label = diff.upper() if diff == 'overall' else diff\n",
-    "    print(f'  {label:<12} {b:>10.3f} {s:>10.3f} {f:>10.3f}')\n",
-    "print(f'{\"=\" * 60}')\n",
-    "print('\\nDone. Model was NOT saved (in-memory only).')"
-   ]
-  }
- ],
- "metadata": {
-  "accelerator": "GPU",
-  "colab": {
-   "gpuType": "T4",
-   "provenance": []
-  },
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "name": "python",
-   "version": "3.10.0"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}

train.py DELETED Viewed

@@ -1,632 +0,0 @@
-"""
-Visual Reasoning — Training Script
-Simulation-based RL for teaching CS algorithms on a whiteboard.
-Stages:
-  1. Baseline  — score untrained model across all difficulties
-  2. SFT       — imitate gold demonstrations to learn action format
-  3. GRPO      — RL with dense environment rewards, easy → expert curriculum
-  4. Final eval — delta report comparing all three checkpoints
-"""
-import subprocess
-import sys
-def install_packages():
-    # Upgrade torchvision first to match whatever torch version unsloth pulls in
-    subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", "--upgrade", "torchvision>=0.25.0"])
-    packages = [
-        "huggingface_hub",
-        "unsloth",
-        "trl",
-        "datasets",
-        "transformers",
-        "accelerate",
-        "bitsandbytes",
-        "peft",
-        "torch",
-        "openenv-core",
-        "fastapi",
-        "uvicorn",
-        "pydantic",
-        "python-dotenv",
-        "networkx",
-        "shapely",
-        "sentence-transformers",
-        "rapidfuzz",
-        "textstat",
-        "sortedcontainers",
-        "numpy<2.0",
-    ]
-    subprocess.check_call([sys.executable, "-m", "pip", "install", "-q"] + packages)
-install_packages()
-# ── Section 2: Download Visual Reasoning Environment ─────────────────────────
-import os
-from huggingface_hub import snapshot_download
-if not os.path.isdir("visual_reasoning"):
-    snapshot_download(
-        repo_id="sreeramajay/visual_reasoning-env",
-        repo_type="space",
-        local_dir="visual_reasoning",
-        ignore_patterns=["*.gitattributes", ".gitignore", "README.md"],
-    )
-os.chdir("visual_reasoning")
-# ── Section 3: Imports ────────────────────────────────────────────────────────
-import json
-import torch
-from collections import Counter
-sys.path.insert(0, ".")
-from unsloth import FastLanguageModel
-from datasets import Dataset
-from trl import SFTConfig, SFTTrainer, GRPOConfig, GRPOTrainer
-from torch.utils.data import SequentialSampler
-from models import VisualReasoningAction
-from server.visual_reasoning_environment import VisualReasoningEnvironment
-from inference import SYSTEM_PROMPT, build_user_prompt, parse_action, normalize_action
-# ── Section 4: Smoke Test ─────────────────────────────────────────────────────
-test_env = VisualReasoningEnvironment()
-obs = test_env.reset(scenario_id="easy_1")
-print(f"Scenario: {obs.scenario_id}")
-print(f"Goal: {obs.goal}")
-print(f"Concepts: {obs.concept_checklist}")
-print(f"Step budget: {obs.remaining_step_budget}")
-obs = test_env.step(
-    VisualReasoningAction(
-        step_type="advance",
-        narration="Adding the first node with value 10.",
-        ops=[{"op": "add_node", "target_ids": ["n0"], "params": {"value": 10}}],
-        covered_concepts=["node_value"],
-        intent="test",
-    )
-)
-print(f"\nAfter step: entities={list(obs.entities.keys())}, reward={obs.reward:.3f}, error={obs.action_error}")
-print("Environment OK!")
-del test_env
-# ── Section 5: Configuration ──────────────────────────────────────────────────
-MODEL_NAME = "unsloth/Qwen2.5-3B-Instruct-bnb-4bit"
-MAX_SEQ_LENGTH = 4096
-LORA_R = 16
-LORA_ALPHA = 32
-SFT_EPOCHS = 3
-GRPO_EPOCHS = 2
-SCENARIOS = {
-    "easy":   ["easy_1", "easy_2", "easy_3", "gen_easy_1001", "gen_easy_1002"],
-    "medium": ["medium_1", "medium_2", "gen_medium_2001", "gen_medium_2002"],
-    "hard":   ["hard_1", "hard_2", "gen_hard_3001", "gen_hard_3002"],
-    "expert": ["expert_1", "expert_2", "gen_expert_4001", "gen_expert_4002"],
-}
-DIFFICULTIES = ("easy", "medium", "hard", "expert")
-# ── Section 6: Load Model (Unsloth + LoRA) ────────────────────────────────────
-model, tokenizer = FastLanguageModel.from_pretrained(
-    model_name=MODEL_NAME,
-    max_seq_length=MAX_SEQ_LENGTH,
-    dtype=None,
-    load_in_4bit=True,
-)
-model = FastLanguageModel.get_peft_model(
-    model,
-    r=LORA_R,
-    lora_alpha=LORA_ALPHA,
-    lora_dropout=0,
-    target_modules=[
-        "q_proj", "k_proj", "v_proj", "o_proj",
-        "gate_proj", "up_proj", "down_proj",
-    ],
-    bias="none",
-    use_gradient_checkpointing="unsloth",
-    random_state=0,
-)
-if tokenizer.pad_token_id is None:
-    tokenizer.pad_token = tokenizer.eos_token
-model.generation_config.max_length = None
-trainable = sum(p.numel() for p in model.parameters() if p.requires_grad)
-total = sum(p.numel() for p in model.parameters())
-print(f"Parameters: {total / 1e6:.1f}M total, {trainable / 1e6:.2f}M trainable ({trainable / total:.2%})")
-# ── Section 7: Environment Wrapper & Helpers ──────────────────────────────────
-class EnvRunner:
-    """Thin wrapper over VisualReasoningEnvironment for a clean reset/step API."""
-    def __init__(self):
-        self.env = VisualReasoningEnvironment()
-    def reset(self, scenario_id=None, task_name=None):
-        return self.env.reset(scenario_id=scenario_id, task_name=task_name)
-    def step(self, action_dict):
-        act = VisualReasoningAction(**action_dict)
-        obs = self.env.step(act)
-        return obs, float(obs.reward), bool(obs.done)
-def generate_action(model, tokenizer, obs, last_action=None, last_reward=0.0, history=None):
-    """Generate one action from the model given an observation."""
-    FastLanguageModel.for_inference(model)
-    user_prompt = build_user_prompt(obs, last_action, last_reward, history or [])
-    messages = [
-        {"role": "system", "content": SYSTEM_PROMPT},
-        {"role": "user", "content": user_prompt},
-    ]
-    prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
-    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
-    with torch.no_grad():
-        out = model.generate(
-            **inputs,
-            max_new_tokens=384,
-            do_sample=False,
-            pad_token_id=tokenizer.pad_token_id or tokenizer.eos_token_id,
-        )
-    text = tokenizer.decode(out[0, inputs.input_ids.shape[1]:], skip_special_tokens=True)
-    return normalize_action(parse_action(text) or {}), text
-FALLBACK_ACTION = {
-    "step_type": "complete",
-    "narration": "Explanation complete.",
-    "ops": [],
-    "covered_concepts": [],
-    "intent": "finalize",
-}
-def run_episode(env, scenario_id, model, tokenizer, max_steps=24):
-    """Run a full episode. Returns (final_score, steps_taken)."""
-    obs = env.reset(scenario_id=scenario_id)
-    last_action, last_reward, history = None, 0.0, []
-    steps = 0
-    for step_num in range(1, max_steps + 1):
-        action, _ = generate_action(model, tokenizer, obs, last_action, last_reward, history)
-        if action is None:
-            action = FALLBACK_ACTION
-        obs, reward, done = env.step(action)
-        last_action, last_reward = action, reward
-        history.append(f"Step {step_num}: {action.get('narration', '')}")
-        steps = step_num
-        if done:
-            break
-    return float(obs.score_breakdown.get("overall_score", 0.0)), steps
-def evaluate(model, tokenizer, env, label, stages=None):
-    """Evaluate across all scenarios. Returns per-difficulty + overall scores."""
-    stages = stages or list(DIFFICULTIES)
-    print(f"\n{'=' * 60}")
-    print(f"  {label}")
-    print(f"{'=' * 60}")
-    results = {}
-    for diff in stages:
-        scores = []
-        for sid in SCENARIOS.get(diff, []):
-            score, steps = run_episode(env, sid, model, tokenizer)
-            scores.append(score)
-            print(f"  [{diff:6}] {sid:22} score={score:.3f} steps={steps}")
-        mean = sum(scores) / max(len(scores), 1)
-        results[diff] = mean
-        print(f"  [{diff:6}] mean={mean:.3f}")
-    overall = sum(results.values()) / max(len(results), 1)
-    results["overall"] = overall
-    print(f"  OVERALL: {overall:.3f}")
-    return results
-env = EnvRunner()
-print("EnvRunner ready.")
-# ── Section 8: Baseline Evaluation ───────────────────────────────────────────
-baseline = evaluate(model, tokenizer, env, "Baseline (untrained LoRA)")
-# ── Section 9: Gold Demonstrations ───────────────────────────────────────────
-GOLD_EPISODES = {
-    # ── easy_1: linked_list_traversal ──────────────────────────────────
-    # input_data: {"values": [10, 20, 30]}
-    # concepts:   head_pointer, node_value, next_link, tail_marker
-    "easy_1": [
-        {
-            "step_type": "advance",
-            "narration": "Building a linked list — creating a centered layout region and the first two nodes with values 10 and 20.",
-            "ops": [
-                {"op": "add_region", "target_ids": ["list"], "params": {"style": "array", "title": "Linked List", "position": "center"}},
-                {"op": "add_node", "target_ids": ["n1"], "params": {"value": 10, "region": "list"}},
-                {"op": "add_node", "target_ids": ["n2"], "params": {"value": 20, "region": "list"}},
-            ],
-            "covered_concepts": ["node_value"],
-            "intent": "create_list_start",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Adding node 30 and connecting all nodes with next links to form the chain 10 -> 20 -> 30.",
-            "ops": [
-                {"op": "add_node", "target_ids": ["n3"], "params": {"value": 30, "region": "list"}},
-                {"op": "add_edge", "target_ids": ["n1", "n2"], "params": {"kind": "directed", "label": "next"}},
-                {"op": "add_edge", "target_ids": ["n2", "n3"], "params": {"kind": "directed", "label": "next"}},
-            ],
-            "covered_concepts": ["next_link"],
-            "intent": "connect_nodes",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Placing a head pointer at node 10 because traversal always starts at the head, our only entry point into the list.",
-            "ops": [
-                {"op": "add_pointer", "target_ids": ["head_ptr"], "params": {"region": "list"}},
-                {"op": "move_pointer", "target_ids": ["head_ptr"], "params": {"index": "n1"}},
-                {"op": "annotate", "target_ids": ["n1"], "params": {"text": "Head"}},
-                {"op": "set_role", "target_ids": ["n1"], "params": {"role": "current"}},
-            ],
-            "covered_concepts": ["head_pointer"],
-            "intent": "mark_head",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Following the next link from 10 to 20 — the pointer advances and we mark node 10 as visited.",
-            "ops": [
-                {"op": "set_role", "target_ids": ["n1"], "params": {"role": "visited"}},
-                {"op": "set_role", "target_ids": ["n2"], "params": {"role": "current"}},
-                {"op": "move_pointer", "target_ids": ["head_ptr"], "params": {"index": "n2"}},
-            ],
-            "covered_concepts": [],
-            "intent": "traverse_to_second",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Reaching node 30 — it has no next link, making it the tail that signals the end of traversal.",
-            "ops": [
-                {"op": "set_role", "target_ids": ["n2"], "params": {"role": "visited"}},
-                {"op": "set_role", "target_ids": ["n3"], "params": {"role": "current"}},
-                {"op": "annotate", "target_ids": ["n3"], "params": {"text": "Tail"}},
-            ],
-            "covered_concepts": ["tail_marker"],
-            "intent": "reach_tail",
-        },
-        {
-            "step_type": "complete",
-            "narration": "Traversal complete — visited every node from head to tail following next links, reading values 10, 20, 30 in order.",
-            "ops": [{"op": "set_role", "target_ids": ["n3"], "params": {"role": "done"}}],
-            "covered_concepts": [],
-            "intent": "summarize",
-        },
-    ],
-    # ── easy_2: stack_ops ──────────────────────────────────────────────
-    # input_data: {"operations": ["push A", "push B", "pop", "push C"]}
-    # concepts:   top_pointer, push, pop, lifo_order
-    "easy_2": [
-        {
-            "step_type": "advance",
-            "narration": "Setting up a stack with a centered visual region and a container to track push and pop membership.",
-            "ops": [
-                {"op": "add_region", "target_ids": ["stack_area"], "params": {"style": "stack", "title": "Stack", "position": "center"}},
-                {"op": "add_container", "target_ids": ["stk"], "params": {"region": "stack_area", "ordered": False, "title": "Stack"}},
-            ],
-            "covered_concepts": [],
-            "intent": "setup_stack",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Pushing A onto the stack — A becomes the first element. Adding a top pointer to track the stack top.",
-            "ops": [
-                {"op": "add_node", "target_ids": ["a"], "params": {"value": "A", "region": "stack_area"}},
-                {"op": "push_to", "target_ids": ["stk", "a"], "params": {}},
-                {"op": "add_pointer", "target_ids": ["top"], "params": {"region": "stack_area"}},
-                {"op": "move_pointer", "target_ids": ["top"], "params": {"index": "a"}},
-            ],
-            "covered_concepts": ["push", "top_pointer"],
-            "intent": "push_a",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Pushing B — B sits on top of A and the top pointer moves up to B.",
-            "ops": [
-                {"op": "add_node", "target_ids": ["b"], "params": {"value": "B", "region": "stack_area"}},
-                {"op": "push_to", "target_ids": ["stk", "b"], "params": {}},
-                {"op": "move_pointer", "target_ids": ["top"], "params": {"index": "b"}},
-            ],
-            "covered_concepts": [],
-            "intent": "push_b",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Popping from the stack — B was pushed last so B comes off first, demonstrating LIFO (last-in-first-out) order.",
-            "ops": [
-                {"op": "pop_from", "target_ids": ["stk"], "params": {}},
-                {"op": "set_role", "target_ids": ["b"], "params": {"role": "inactive"}},
-                {"op": "move_pointer", "target_ids": ["top"], "params": {"index": "a"}},
-            ],
-            "covered_concepts": ["pop", "lifo_order"],
-            "intent": "pop_b",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Pushing C onto the stack — C now sits on top of A, with B already removed.",
-            "ops": [
-                {"op": "add_node", "target_ids": ["c"], "params": {"value": "C", "region": "stack_area"}},
-                {"op": "push_to", "target_ids": ["stk", "c"], "params": {}},
-                {"op": "move_pointer", "target_ids": ["top"], "params": {"index": "c"}},
-            ],
-            "covered_concepts": [],
-            "intent": "push_c",
-        },
-        {
-            "step_type": "complete",
-            "narration": "All four operations executed — stack holds A at bottom and C on top after push A, push B, pop, push C.",
-            "ops": [],
-            "covered_concepts": [],
-            "intent": "summarize",
-        },
-    ],
-    # ── easy_3: binary_search ──────────────────────────────────────────
-    # input_data: {"array": [1, 3, 5, 7, 9, 11, 13], "target": 7}
-    # concepts:   sorted_invariant, low_pointer, high_pointer, mid_pointer, comparison
-    "easy_3": [
-        {
-            "step_type": "advance",
-            "narration": "Creating the first four elements of a sorted array in a centered region — the sorted invariant is what makes binary search possible.",
-            "ops": [
-                {"op": "add_region", "target_ids": ["arr"], "params": {"style": "array", "title": "Sorted Array", "position": "center"}},
-                {"op": "add_node", "target_ids": ["n0"], "params": {"value": 1, "region": "arr"}},
-                {"op": "add_node", "target_ids": ["n1"], "params": {"value": 3, "region": "arr"}},
-                {"op": "add_node", "target_ids": ["n2"], "params": {"value": 5, "region": "arr"}},
-            ],
-            "covered_concepts": ["sorted_invariant"],
-            "intent": "create_array_part1",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Adding the remaining elements 7, 9, 11, 13 to complete all seven values of the sorted array.",
-            "ops": [
-                {"op": "add_node", "target_ids": ["n3"], "params": {"value": 7, "region": "arr"}},
-                {"op": "add_node", "target_ids": ["n4"], "params": {"value": 9, "region": "arr"}},
-                {"op": "add_node", "target_ids": ["n5"], "params": {"value": 11, "region": "arr"}},
-                {"op": "add_node", "target_ids": ["n6"], "params": {"value": 13, "region": "arr"}},
-            ],
-            "covered_concepts": [],
-            "intent": "create_array_part2",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Placing low pointer at index 0 (value 1) and high pointer at index 6 (value 13) to bracket the search range.",
-            "ops": [
-                {"op": "add_pointer", "target_ids": ["low"], "params": {"region": "arr"}},
-                {"op": "move_pointer", "target_ids": ["low"], "params": {"index": "n0"}},
-                {"op": "add_pointer", "target_ids": ["high"], "params": {"region": "arr"}},
-                {"op": "move_pointer", "target_ids": ["high"], "params": {"index": "n6"}},
-            ],
-            "covered_concepts": ["low_pointer", "high_pointer"],
-            "intent": "init_pointers",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Computing mid = (0+6)/2 = 3 — the mid pointer lands on value 7, which we compare against our target 7.",
-            "ops": [
-                {"op": "add_pointer", "target_ids": ["mid"], "params": {"region": "arr"}},
-                {"op": "move_pointer", "target_ids": ["mid"], "params": {"index": "n3"}},
-                {"op": "set_role", "target_ids": ["n3"], "params": {"role": "current"}},
-                {"op": "highlight", "target_ids": ["n3"], "params": {}},
-            ],
-            "covered_concepts": ["mid_pointer", "comparison"],
-            "intent": "compute_mid_and_compare",
-        },
-        {
-            "step_type": "complete",
-            "narration": "Target 7 found at index 3 — binary search located it in one comparison because the sorted invariant halves the search space each step.",
-            "ops": [
-                {"op": "set_role", "target_ids": ["n3"], "params": {"role": "done"}},
-                {"op": "annotate", "target_ids": ["n3"], "params": {"text": "Found: 7"}},
-            ],
-            "covered_concepts": [],
-            "intent": "found_target",
-        },
-    ],
-}
-print(f"Gold episodes: {len(GOLD_EPISODES)} scenarios, {sum(len(v) for v in GOLD_EPISODES.values())} total steps")
-# ── Section 10: SFT Warmup ────────────────────────────────────────────────────
-def generate_sft_data(env, gold_episodes, tokenizer):
-    """Replay gold episodes through the env, collecting chat-formatted training data."""
-    rows = []
-    for scenario_id, actions in gold_episodes.items():
-        obs = env.reset(scenario_id=scenario_id)
-        last_action, last_reward, history = None, 0.0, []
-        for i, action in enumerate(actions):
-            user_prompt = build_user_prompt(obs, last_action, last_reward, history)
-            messages = [
-                {"role": "system", "content": SYSTEM_PROMPT},
-                {"role": "user", "content": user_prompt},
-                {"role": "assistant", "content": json.dumps(action, separators=(",", ":"))},
-            ]
-            text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)
-            rows.append({"text": text})
-            obs, reward, done = env.step(action)
-            last_action, last_reward = action, reward
-            history.append(f"Step {i + 1}: {action.get('narration', '')}")
-            if done:
-                break
-    return Dataset.from_list(rows)
-sft_data = generate_sft_data(env, GOLD_EPISODES, tokenizer)
-print(f"SFT training examples: {len(sft_data)}")
-FastLanguageModel.for_training(model)
-sft_config = SFTConfig(
-    output_dir="/tmp/vr_sft_scratch",
-    num_train_epochs=SFT_EPOCHS,
-    per_device_train_batch_size=2,
-    gradient_accumulation_steps=4,
-    learning_rate=2e-4,
-    lr_scheduler_type="cosine",
-    warmup_ratio=0.03,
-    logging_steps=5,
-    save_strategy="no",
-    fp16=True,
-    max_seq_length=MAX_SEQ_LENGTH,
-    dataset_text_field="text",
-    optim="adamw_8bit",
-    report_to="none",
-)
-sft_trainer = SFTTrainer(
-    model=model,
-    processing_class=tokenizer,
-    args=sft_config,
-    train_dataset=sft_data,
-)
-print("\nTraining SFT...")
-sft_trainer.train()
-print("SFT complete!")
-# ── Section 11: Post-SFT Evaluation ──────────────────────────────────────────
-sft_results = evaluate(model, tokenizer, env, "Post-SFT")
-# ── Section 12: GRPO Training ─────────────────────────────────────────────────
-def generate_grpo_prompts(env, stages, samples_per_scenario=2):
-    """Collect initial-state prompts for GRPO training."""
-    rows = []
-    for stage in stages:
-        for sid in SCENARIOS[stage]:
-            for _ in range(samples_per_scenario):
-                obs = env.reset(scenario_id=sid)
-                user_prompt = build_user_prompt(obs, None, 0.0, [])
-                messages = [
-                    {"role": "system", "content": SYSTEM_PROMPT},
-                    {"role": "user", "content": user_prompt},
-                ]
-                rows.append({"prompt": messages, "scenario_id": sid})
-    return Dataset.from_list(rows)
-def make_reward_fn(env):
-    """Reward function: parse model completion, step in env, return overall_score."""
-    state = {"calls": 0, "hist": Counter()}
-    def reward_fn(completions, scenario_id=None, **_):
-        texts = []
-        for c in completions:
-            if isinstance(c, list):
-                texts.append(c[-1].get("content", "") if c else "")
-            else:
-                texts.append(str(c))
-        sids = scenario_id if isinstance(scenario_id, list) else [scenario_id] * len(texts)
-        if len(sids) < len(texts):
-            n_gen = len(texts) // len(sids)
-            sids = [s for s in sids for _ in range(n_gen)]
-        rewards = []
-        for sid, text in zip(sids, texts):
-            obs = env.reset(scenario_id=sid)
-            action = normalize_action(parse_action(text) or {})
-            if action is None:
-                rewards.append(0.0)
-                state["hist"]["<unparseable>"] += 1
-                continue
-            obs, _, _ = env.step(action)
-            score = float(obs.score_breakdown.get("overall_score", 0.0))
-            rewards.append(score)
-            state["hist"][action.get("step_type", "?")] += 1
-        state["calls"] += 1
-        if state["calls"] % 5 == 0:
-            print(f"  [reward] call={state['calls']} types={dict(state['hist'])}")
-        return rewards
-    return reward_fn
-grpo_data = generate_grpo_prompts(env, list(DIFFICULTIES))
-print(f"GRPO training prompts: {len(grpo_data)}")
-FastLanguageModel.for_training(model)
-grpo_config = GRPOConfig(
-    output_dir="/tmp/vr_grpo_scratch",
-    num_train_epochs=GRPO_EPOCHS,
-    per_device_train_batch_size=2,
-    gradient_accumulation_steps=4,
-    num_generations=4,
-    max_completion_length=384,
-    learning_rate=1e-5,
-    lr_scheduler_type="cosine",
-    warmup_ratio=0.1,
-    beta=0.05,
-    max_grad_norm=0.5,
-    temperature=0.9,
-    logging_steps=1,
-    save_strategy="no",
-    bf16=True,
-    optim="adamw_8bit",
-    report_to="none",
-    remove_unused_columns=False,
-)
-class CurriculumGRPOTrainer(GRPOTrainer):
-    """Preserve easy -> expert ordering by disabling dataset shuffle."""
-    def _get_train_sampler(self, *_args, **_kwargs):
-        return SequentialSampler(self.train_dataset)
-grpo_trainer = CurriculumGRPOTrainer(
-    model=model,
-    tokenizer=tokenizer,
-    args=grpo_config,
-    train_dataset=grpo_data,
-    reward_funcs=make_reward_fn(env),
-)
-print("Training GRPO...")
-grpo_trainer.train()
-print("GRPO complete!")
-# ── Section 13: Final Evaluation + Delta Report ───────────────────────────────
-final_results = evaluate(model, tokenizer, env, "Final (SFT + GRPO)")
-print(f"\n{'=' * 60}")
-print("  DELTA REPORT")
-print(f"{'=' * 60}")
-print(f"  {'Difficulty':<12} {'Baseline':>10} {'SFT':>10} {'SFT+GRPO':>10}")
-print(f"  {'-' * 12} {'-' * 10} {'-' * 10} {'-' * 10}")
-for diff in list(DIFFICULTIES) + ["overall"]:
-    b = baseline.get(diff, 0.0)
-    s = sft_results.get(diff, 0.0)
-    f = final_results.get(diff, 0.0)
-    label = diff.upper() if diff == "overall" else diff
-    print(f"  {label:<12} {b:>10.3f} {s:>10.3f} {f:>10.3f}")
-print(f"{'=' * 60}")
-print("\nDone. Model was NOT saved (in-memory only).")

train_hf.py DELETED Viewed

@@ -1,771 +0,0 @@
-"""Visual Reasoning — HuggingFace Training Job
-Train Qwen3-8B to be an expert visual CS teacher via SFT warmup + staged GRPO.
-Run with: hf jobs run --flavor a100 train_hf.py
-Speedups over train.ipynb:
-  - NARRATION_SCORER=fallback → CPU heuristic, no GPU contention
-  - Batched generation → N episodes in one model.generate() call
-  - Early termination → kill episodes after 3 consecutive no-ops
-  - snapshot_download → faster than git clone for large repos
-"""
-# ── 0. Install dependencies before any imports ──────────────────────────────
-import subprocess, sys
-def pip_install(*packages):
-    subprocess.check_call(
-        [sys.executable, "-m", "pip", "install", "-q", *packages],
-        stdout=subprocess.DEVNULL,
-    )
-print("[0/7] Installing dependencies...")
-pip_install(
-    "unsloth", "trl", "datasets", "transformers", "accelerate",
-    "bitsandbytes", "peft", "torch",
-)
-pip_install(
-    "openenv-core", "fastapi", "uvicorn", "pydantic",
-)
-pip_install(
-    "python-dotenv", "networkx", "shapely", "sentence-transformers",
-    "rapidfuzz", "textstat", "sortedcontainers", "huggingface_hub",
-)
-print("[0/7] Dependencies installed.")
-# ── 1. Imports (unsloth first to patch transformers early) ──────────────────
-import os
-os.environ["NARRATION_SCORER"] = "fallback"  # CPU scorer, no GPU contention
-from unsloth import FastLanguageModel  # must be before transformers
-import json
-import time
-import torch
-from collections import Counter
-from datasets import Dataset
-from huggingface_hub import snapshot_download
-from torch.utils.data import SequentialSampler
-from trl import SFTConfig, SFTTrainer, GRPOConfig, GRPOTrainer
-# ── 2. Config ───────────────────────────────────────────────────────────────
-# Model
-MODEL_NAME        = "unsloth/Qwen3-8B-unsloth-bnb-4bit"
-MAX_SEQ_LENGTH    = 4096
-LORA_R            = 32
-LORA_ALPHA        = 64
-# SFT
-SFT_EPOCHS        = 3
-SFT_LR            = 2e-4
-SFT_BATCH_SIZE    = 2
-SFT_GRAD_ACCUM    = 4
-# GRPO
-GRPO_LR           = 5e-6
-GRPO_STAGE1_EPOCHS = 2
-GRPO_STAGE2_EPOCHS = 2
-GRPO_NUM_GENERATIONS = 4
-GRPO_BATCH_SIZE   = 2
-GRPO_GRAD_ACCUM   = 4
-GRPO_TEMPERATURE  = 0.9
-# Scenario counts for procedural generation
-GRPO_SCENARIOS_PER_DIFFICULTY = {"easy": 40, "medium": 30, "hard": 20, "expert": 10}
-DIFFICULTIES = ("easy", "medium", "hard", "expert")
-# Eval
-EVAL_BATCH_SIZE   = 8
-EVAL_MAX_STEPS    = 24
-NOOP_EARLY_STOP   = 3
-# Hub — HF_TOKEN is set by the HuggingFace jobs runtime
-HUB_REPO          = None  # set to "username/model-name" to push
-HF_TOKEN          = os.environ.get("HF_TOKEN")
-# Static scenarios for evaluation
-STATIC_SCENARIOS = {
-    "easy":   ["easy_1", "easy_2", "easy_3"],
-    "medium": ["medium_1", "medium_2"],
-    "hard":   ["hard_1", "hard_2"],
-    "expert": ["expert_1", "expert_2"],
-}
-# ── 3. Download environment repo ───────────────────────────────────────────
-print("[1/7] Downloading visual_reasoning environment...")
-env_path = snapshot_download(
-    repo_id="sreeramajay/visual_reasoning-env",
-    repo_type="space",
-    local_dir="visual_reasoning",
-    token=HF_TOKEN,
-)
-sys.path.insert(0, env_path)
-print(f"[1/7] Environment downloaded to {env_path}")
-from models import VisualReasoningAction, VisualReasoningObservation
-from server.visual_reasoning_environment import VisualReasoningEnvironment
-from server.scenario_generator import generate_scenario
-from inference import (
-    SYSTEM_PROMPT, build_user_prompt, parse_action,
-    normalize_action as inf_normalize_action,
-)
-# Smoke test
-_test_env = VisualReasoningEnvironment()
-_obs = _test_env.reset(scenario_id="easy_1")
-_obs = _test_env.step(VisualReasoningAction(
-    step_type="advance",
-    narration="Adding the first node with value 10.",
-    ops=[{"op": "add_node", "target_ids": ["n0"], "params": {"value": 10}}],
-    covered_concepts=["node_value"], intent="test",
-))
-assert _obs.reward != 0.0, "Environment smoke test failed"
-del _test_env, _obs
-print("[1/7] Environment smoke test passed.")
-# ── 4. Load model ──────────────────────────────────────────────────────────
-print("[2/7] Loading model...")
-t0 = time.time()
-model, tokenizer = FastLanguageModel.from_pretrained(
-    model_name=MODEL_NAME,
-    max_seq_length=MAX_SEQ_LENGTH,
-    dtype=None,
-    load_in_4bit=True,
-)
-model = FastLanguageModel.get_peft_model(
-    model,
-    r=LORA_R,
-    lora_alpha=LORA_ALPHA,
-    lora_dropout=0,
-    target_modules=[
-        "q_proj", "k_proj", "v_proj", "o_proj",
-        "gate_proj", "up_proj", "down_proj",
-    ],
-    bias="none",
-    use_gradient_checkpointing="unsloth",
-    random_state=0,
-)
-if tokenizer.pad_token_id is None:
-    tokenizer.pad_token = tokenizer.eos_token
-model.generation_config.max_length = None
-trainable = sum(p.numel() for p in model.parameters() if p.requires_grad)
-total = sum(p.numel() for p in model.parameters())
-print(f"[2/7] Model loaded in {time.time() - t0:.0f}s — "
-      f"{total/1e6:.0f}M params, {trainable/1e6:.1f}M trainable ({trainable/total:.1%})")
-# ── 5. Batched evaluation engine ───────────────────────────────────────────
-FALLBACK_ACTION = {
-    "step_type": "complete", "narration": "Explanation complete.",
-    "ops": [], "covered_concepts": [], "intent": "finalize",
-}
-# Reusable env pool — avoids repeated __init__ / warmup_scorer overhead
-_env_pool = []
-def _get_env(idx):
-    while len(_env_pool) <= idx:
-        _env_pool.append(VisualReasoningEnvironment())
-    return _env_pool[idx]
-class EpisodeState:
-    """Tracks one in-flight episode for batched eval."""
-    def __init__(self, env, scenario_id):
-        self.env = env
-        self.scenario_id = scenario_id
-        self.obs = env.reset(scenario_id=scenario_id)
-        self.last_action = None
-        self.last_reward = 0.0
-        self.history = []
-        self.steps = 0
-        self.done = False
-        self.score = 0.0
-        self.consecutive_noops = 0
-    def build_prompt_text(self):
-        user_prompt = build_user_prompt(
-            self.obs, self.last_action, self.last_reward, self.history
-        )
-        messages = [
-            {"role": "system", "content": SYSTEM_PROMPT},
-            {"role": "user",   "content": user_prompt},
-        ]
-        return tokenizer.apply_chat_template(
-            messages, tokenize=False, add_generation_prompt=True
-        )
-    def apply_action(self, action_dict):
-        if action_dict is None:
-            action_dict = FALLBACK_ACTION
-        self.obs = self.env.step(VisualReasoningAction(**action_dict))
-        reward = float(self.obs.reward)
-        self.last_action = action_dict
-        self.last_reward = reward
-        self.steps += 1
-        self.history.append(f"Step {self.steps}: {action_dict.get('narration', '')}")
-        if reward <= -0.04:
-            self.consecutive_noops += 1
-        else:
-            self.consecutive_noops = 0
-        if self.obs.done or self.consecutive_noops >= NOOP_EARLY_STOP:
-            self.done = True
-        self.score = float(self.obs.score_breakdown.get("overall_score", 0.0))
-def batched_generate(prompt_texts, max_new_tokens=384):
-    """Single batched model.generate() call for all active episodes."""
-    FastLanguageModel.for_inference(model)
-    inputs = tokenizer(
-        prompt_texts,
-        return_tensors="pt",
-        padding=True,
-        truncation=True,
-        max_length=MAX_SEQ_LENGTH,
-    ).to(model.device)
-    with torch.no_grad():
-        outputs = model.generate(
-            **inputs,
-            max_new_tokens=max_new_tokens,
-            do_sample=False,
-            pad_token_id=tokenizer.pad_token_id or tokenizer.eos_token_id,
-        )
-    results = []
-    for i, out in enumerate(outputs):
-        input_len = inputs.input_ids[i].ne(tokenizer.pad_token_id).sum()
-        results.append(tokenizer.decode(out[input_len:], skip_special_tokens=True))
-    return results
-def evaluate(label, stages=None, scenarios=None):
-    """Batched eval: runs EVAL_BATCH_SIZE episodes in parallel per generate call."""
-    stages = stages or list(DIFFICULTIES)
-    scenarios = scenarios or STATIC_SCENARIOS
-    print(f"\n{'=' * 60}")
-    print(f"  EVAL: {label}")
-    print(f"{'=' * 60}")
-    all_ids = []
-    for diff in stages:
-        all_ids.extend([(diff, sid) for sid in scenarios.get(diff, [])])
-    results_by_diff = {d: [] for d in stages}
-    t0 = time.time()
-    for batch_start in range(0, len(all_ids), EVAL_BATCH_SIZE):
-        batch = all_ids[batch_start : batch_start + EVAL_BATCH_SIZE]
-        episodes = [EpisodeState(_get_env(i), sid) for i, (_, sid) in enumerate(batch)]
-        for _ in range(1, EVAL_MAX_STEPS + 1):
-            active = [ep for ep in episodes if not ep.done]
-            if not active:
-                break
-            prompts = [ep.build_prompt_text() for ep in active]
-            texts = batched_generate(prompts)
-            for ep, text in zip(active, texts):
-                parsed = parse_action(text)
-                action = inf_normalize_action(parsed or {}) if parsed else None
-                ep.apply_action(action)
-        for ep, (diff, sid) in zip(episodes, batch):
-            results_by_diff[diff].append(ep.score)
-            print(f"  [{diff:6}] {sid:22} score={ep.score:.3f}  steps={ep.steps}")
-    overall_scores = {}
-    for diff in stages:
-        scores = results_by_diff[diff]
-        mean = sum(scores) / max(len(scores), 1)
-        overall_scores[diff] = mean
-        print(f"  [{diff:6}] MEAN = {mean:.3f}")
-    overall = sum(overall_scores.values()) / max(len(overall_scores), 1)
-    overall_scores["overall"] = overall
-    print(f"  OVERALL: {overall:.3f}  ({time.time() - t0:.1f}s)")
-    return overall_scores
-# ── 6. Gold demonstrations for SFT ─────────────────────────────────────────
-# Hand-crafted episodes that teach the model:
-#   - JSON action format (step_type, narration, ops, covered_concepts, intent)
-#   - Incremental drawing (Phase 1), algorithm walk-through (Phase 2), wrap-up (Phase 3)
-#   - Region vs container distinction, concept evidencing, pacing
-GOLD_EPISODES = {
-    # ── easy_1: linked_list_traversal ──
-    # input: {"values": [10, 20, 30]}, concepts: head_pointer, node_value, next_link, tail_marker
-    "easy_1": [
-        {
-            "step_type": "advance",
-            "narration": "Building a linked list — creating a centered layout region and the first two nodes with values 10 and 20.",
-            "ops": [
-                {"op": "add_region", "target_ids": ["list"], "params": {"style": "array", "title": "Linked List", "position": "center"}},
-                {"op": "add_node", "target_ids": ["n1"], "params": {"value": 10, "region": "list"}},
-                {"op": "add_node", "target_ids": ["n2"], "params": {"value": 20, "region": "list"}},
-            ],
-            "covered_concepts": ["node_value"],
-            "intent": "create_list_start",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Adding node 30 and connecting all nodes with next links to form the chain 10 -> 20 -> 30.",
-            "ops": [
-                {"op": "add_node", "target_ids": ["n3"], "params": {"value": 30, "region": "list"}},
-                {"op": "add_edge", "target_ids": ["n1", "n2"], "params": {"kind": "directed", "label": "next"}},
-                {"op": "add_edge", "target_ids": ["n2", "n3"], "params": {"kind": "directed", "label": "next"}},
-            ],
-            "covered_concepts": ["next_link"],
-            "intent": "connect_nodes",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Placing a head pointer at node 10 because traversal always starts at the head, our only entry point into the list.",
-            "ops": [
-                {"op": "add_pointer", "target_ids": ["head_ptr"], "params": {"region": "list"}},
-                {"op": "move_pointer", "target_ids": ["head_ptr"], "params": {"index": "n1"}},
-                {"op": "annotate", "target_ids": ["n1"], "params": {"text": "Head"}},
-                {"op": "set_role", "target_ids": ["n1"], "params": {"role": "current"}},
-            ],
-            "covered_concepts": ["head_pointer"],
-            "intent": "mark_head",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Following the next link from 10 to 20 — the pointer advances and we mark node 10 as visited.",
-            "ops": [
-                {"op": "set_role", "target_ids": ["n1"], "params": {"role": "visited"}},
-                {"op": "set_role", "target_ids": ["n2"], "params": {"role": "current"}},
-                {"op": "move_pointer", "target_ids": ["head_ptr"], "params": {"index": "n2"}},
-            ],
-            "covered_concepts": [],
-            "intent": "traverse_to_second",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Reaching node 30 — it has no next link, making it the tail that signals the end of traversal.",
-            "ops": [
-                {"op": "set_role", "target_ids": ["n2"], "params": {"role": "visited"}},
-                {"op": "set_role", "target_ids": ["n3"], "params": {"role": "current"}},
-                {"op": "annotate", "target_ids": ["n3"], "params": {"text": "Tail"}},
-            ],
-            "covered_concepts": ["tail_marker"],
-            "intent": "reach_tail",
-        },
-        {
-            "step_type": "complete",
-            "narration": "Traversal complete — visited every node from head to tail following next links, reading values 10, 20, 30 in order.",
-            "ops": [{"op": "set_role", "target_ids": ["n3"], "params": {"role": "done"}}],
-            "covered_concepts": [],
-            "intent": "summarize",
-        },
-    ],
-    # ── easy_2: stack_ops ──
-    # input: {"operations": ["push A", "push B", "pop", "push C"]}, concepts: top_pointer, push, pop, lifo_order
-    "easy_2": [
-        {
-            "step_type": "advance",
-            "narration": "Setting up a stack with a centered visual region and a container to track push and pop membership.",
-            "ops": [
-                {"op": "add_region", "target_ids": ["stack_area"], "params": {"style": "stack", "title": "Stack", "position": "center"}},
-                {"op": "add_container", "target_ids": ["stk"], "params": {"region": "stack_area", "ordered": False, "title": "Stack"}},
-            ],
-            "covered_concepts": [],
-            "intent": "setup_stack",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Pushing A onto the stack — A becomes the first element. Adding a top pointer to track the stack top.",
-            "ops": [
-                {"op": "add_node", "target_ids": ["a"], "params": {"value": "A", "region": "stack_area"}},
-                {"op": "push_to", "target_ids": ["stk", "a"], "params": {}},
-                {"op": "add_pointer", "target_ids": ["top"], "params": {"region": "stack_area"}},
-                {"op": "move_pointer", "target_ids": ["top"], "params": {"index": "a"}},
-            ],
-            "covered_concepts": ["push", "top_pointer"],
-            "intent": "push_a",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Pushing B — B sits on top of A and the top pointer moves up to B.",
-            "ops": [
-                {"op": "add_node", "target_ids": ["b"], "params": {"value": "B", "region": "stack_area"}},
-                {"op": "push_to", "target_ids": ["stk", "b"], "params": {}},
-                {"op": "move_pointer", "target_ids": ["top"], "params": {"index": "b"}},
-            ],
-            "covered_concepts": [],
-            "intent": "push_b",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Popping from the stack — B was pushed last so B comes off first, demonstrating LIFO (last-in-first-out) order.",
-            "ops": [
-                {"op": "pop_from", "target_ids": ["stk"], "params": {}},
-                {"op": "set_role", "target_ids": ["b"], "params": {"role": "inactive"}},
-                {"op": "move_pointer", "target_ids": ["top"], "params": {"index": "a"}},
-            ],
-            "covered_concepts": ["pop", "lifo_order"],
-            "intent": "pop_b",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Pushing C onto the stack — C now sits on top of A, with B already removed.",
-            "ops": [
-                {"op": "add_node", "target_ids": ["c"], "params": {"value": "C", "region": "stack_area"}},
-                {"op": "push_to", "target_ids": ["stk", "c"], "params": {}},
-                {"op": "move_pointer", "target_ids": ["top"], "params": {"index": "c"}},
-            ],
-            "covered_concepts": [],
-            "intent": "push_c",
-        },
-        {
-            "step_type": "complete",
-            "narration": "All four operations executed — stack holds A at bottom and C on top after push A, push B, pop, push C.",
-            "ops": [],
-            "covered_concepts": [],
-            "intent": "summarize",
-        },
-    ],
-    # ── easy_3: binary_search ──
-    # input: {"array": [1,3,5,7,9,11,13], "target": 7}, concepts: sorted_invariant, low/high/mid_pointer, comparison
-    "easy_3": [
-        {
-            "step_type": "advance",
-            "narration": "Creating the first four elements of a sorted array in a centered region — the sorted invariant is what makes binary search possible.",
-            "ops": [
-                {"op": "add_region", "target_ids": ["arr"], "params": {"style": "array", "title": "Sorted Array", "position": "center"}},
-                {"op": "add_node", "target_ids": ["n0"], "params": {"value": 1, "region": "arr"}},
-                {"op": "add_node", "target_ids": ["n1"], "params": {"value": 3, "region": "arr"}},
-                {"op": "add_node", "target_ids": ["n2"], "params": {"value": 5, "region": "arr"}},
-            ],
-            "covered_concepts": ["sorted_invariant"],
-            "intent": "create_array_part1",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Adding the remaining elements 7, 9, 11, 13 to complete all seven values of the sorted array.",
-            "ops": [
-                {"op": "add_node", "target_ids": ["n3"], "params": {"value": 7, "region": "arr"}},
-                {"op": "add_node", "target_ids": ["n4"], "params": {"value": 9, "region": "arr"}},
-                {"op": "add_node", "target_ids": ["n5"], "params": {"value": 11, "region": "arr"}},
-                {"op": "add_node", "target_ids": ["n6"], "params": {"value": 13, "region": "arr"}},
-            ],
-            "covered_concepts": [],
-            "intent": "create_array_part2",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Placing low pointer at index 0 (value 1) and high pointer at index 6 (value 13) to bracket the search range.",
-            "ops": [
-                {"op": "add_pointer", "target_ids": ["low"], "params": {"region": "arr"}},
-                {"op": "move_pointer", "target_ids": ["low"], "params": {"index": "n0"}},
-                {"op": "add_pointer", "target_ids": ["high"], "params": {"region": "arr"}},
-                {"op": "move_pointer", "target_ids": ["high"], "params": {"index": "n6"}},
-            ],
-            "covered_concepts": ["low_pointer", "high_pointer"],
-            "intent": "init_pointers",
-        },
-        {
-            "step_type": "advance",
-            "narration": "Computing mid = (0+6)/2 = 3 — the mid pointer lands on value 7, which we compare against our target 7.",
-            "ops": [
-                {"op": "add_pointer", "target_ids": ["mid"], "params": {"region": "arr"}},
-                {"op": "move_pointer", "target_ids": ["mid"], "params": {"index": "n3"}},
-                {"op": "set_role", "target_ids": ["n3"], "params": {"role": "current"}},
-                {"op": "highlight", "target_ids": ["n3"], "params": {}},
-            ],
-            "covered_concepts": ["mid_pointer", "comparison"],
-            "intent": "compute_mid_and_compare",
-        },
-        {
-            "step_type": "complete",
-            "narration": "Target 7 found at index 3 — binary search located it in one comparison because the sorted invariant halves the search space each step.",
-            "ops": [
-                {"op": "set_role", "target_ids": ["n3"], "params": {"role": "done"}},
-                {"op": "annotate", "target_ids": ["n3"], "params": {"text": "Found: 7"}},
-            ],
-            "covered_concepts": [],
-            "intent": "found_target",
-        },
-    ],
-}
-# ── 7. SFT data generation ─────────────────────────────────────────────────
-def generate_sft_data():
-    """Replay gold episodes through the live env to collect (observation, action) pairs."""
-    env = VisualReasoningEnvironment()
-    rows = []
-    for scenario_id, actions in GOLD_EPISODES.items():
-        obs = env.reset(scenario_id=scenario_id)
-        last_action, last_reward, history = None, 0.0, []
-        for i, action in enumerate(actions):
-            user_prompt = build_user_prompt(obs, last_action, last_reward, history)
-            messages = [
-                {"role": "system", "content": SYSTEM_PROMPT},
-                {"role": "user", "content": user_prompt},
-                {"role": "assistant", "content": json.dumps(action, separators=(",", ":"))},
-            ]
-            text = tokenizer.apply_chat_template(
-                messages, tokenize=False, add_generation_prompt=False
-            )
-            rows.append({"text": text})
-            obs = env.step(VisualReasoningAction(**action))
-            last_action, last_reward = action, float(obs.reward)
-            history.append(f"Step {i + 1}: {action.get('narration', '')}")
-            if obs.done:
-                break
-    return Dataset.from_list(rows)
-# ── 8. GRPO helpers ─────────────────────────────────────────────────────────
-def generate_grpo_scenarios():
-    """Procedurally generate diverse scenarios for each difficulty."""
-    out = {}
-    for diff, count in GRPO_SCENARIOS_PER_DIFFICULTY.items():
-        base_seed = {"easy": 10000, "medium": 20000, "hard": 30000, "expert": 40000}[diff]
-        scenarios = [generate_scenario(task_name=diff, seed=base_seed + i) for i in range(count)]
-        templates = Counter(s["template"] for s in scenarios)
-        print(f"    {diff}: {count} scenarios — {dict(templates)}")
-        out[diff] = scenarios
-    return out
-def build_grpo_prompts(scenarios_by_diff, stages, samples_per_scenario=2):
-    """Collect initial-state prompts by resetting the env for each scenario."""
-    env = VisualReasoningEnvironment()
-    rows = []
-    for stage in stages:
-        for scenario in scenarios_by_diff.get(stage, []):
-            sid = scenario["scenario_id"]
-            for _ in range(samples_per_scenario):
-                obs = env.reset(scenario_id=sid)
-                user_prompt = build_user_prompt(obs, None, 0.0, [])
-                messages = [
-                    {"role": "system", "content": SYSTEM_PROMPT},
-                    {"role": "user", "content": user_prompt},
-                ]
-                rows.append({"prompt": messages, "scenario_id": sid})
-    return Dataset.from_list(rows)
-def make_reward_fn():
-    """Reward function: parse completion, step in env, return overall_score."""
-    env = VisualReasoningEnvironment()
-    state = {"calls": 0, "hist": Counter()}
-    def reward_fn(completions, scenario_id=None, **_):
-        texts = []
-        for c in completions:
-            if isinstance(c, list):
-                texts.append(c[-1].get("content", "") if c else "")
-            else:
-                texts.append(str(c))
-        sids = scenario_id if isinstance(scenario_id, list) else [scenario_id] * len(texts)
-        if len(sids) < len(texts):
-            n_gen = len(texts) // len(sids)
-            sids = [s for s in sids for _ in range(n_gen)]
-        rewards = []
-        for sid, text in zip(sids, texts):
-            obs = env.reset(scenario_id=sid)
-            parsed = parse_action(text)
-            action = inf_normalize_action(parsed or {}) if parsed else None
-            if action is None:
-                rewards.append(0.0)
-                state["hist"]["<unparseable>"] += 1
-                continue
-            obs = env.step(VisualReasoningAction(**action))
-            rewards.append(float(obs.score_breakdown.get("overall_score", 0.0)))
-            state["hist"][action.get("step_type", "?")] += 1
-        state["calls"] += 1
-        if state["calls"] % 10 == 0:
-            print(f"    [reward] call={state['calls']}  types={dict(state['hist'])}")
-        return rewards
-    return reward_fn
-class CurriculumGRPOTrainer(GRPOTrainer):
-    """Sequential sampler preserves easy → expert curriculum ordering."""
-    def _get_train_sampler(self, *_args, **_kwargs):
-        return SequentialSampler(self.train_dataset)
-# ── 9. Main training loop ──────────────────────────────────────────────────
-def main():
-    job_start = time.time()
-    # ── Baseline ──
-    print("\n[3/7] Baseline evaluation...")
-    baseline = evaluate("Baseline (untrained LoRA)")
-    # ── SFT ──
-    print("\n[4/7] SFT warmup...")
-    sft_data = generate_sft_data()
-    print(f"  SFT examples: {len(sft_data)}")
-    FastLanguageModel.for_training(model)
-    sft_trainer = SFTTrainer(
-        model=model,
-        processing_class=tokenizer,
-        args=SFTConfig(
-            output_dir="/tmp/vr_sft",
-            num_train_epochs=SFT_EPOCHS,
-            per_device_train_batch_size=SFT_BATCH_SIZE,
-            gradient_accumulation_steps=SFT_GRAD_ACCUM,
-            learning_rate=SFT_LR,
-            lr_scheduler_type="cosine",
-            warmup_ratio=0.03,
-            logging_steps=5,
-            save_strategy="no",
-            bf16=True,
-            max_seq_length=MAX_SEQ_LENGTH,
-            dataset_text_field="text",
-            optim="adamw_8bit",
-            report_to="none",
-        ),
-        train_dataset=sft_data,
-    )
-    t0 = time.time()
-    sft_trainer.train()
-    print(f"  SFT done in {time.time() - t0:.0f}s")
-    sft_results = evaluate("Post-SFT")
-    # ── Generate GRPO scenarios ──
-    print("\n[5/7] Generating GRPO scenarios...")
-    grpo_scenarios = generate_grpo_scenarios()
-    total_scenarios = sum(len(v) for v in grpo_scenarios.values())
-    print(f"  Total: {total_scenarios} scenarios")
-    # ── GRPO Stage 1: easy + medium ──
-    print("\n[6/7] GRPO Stage 1 (easy + medium)...")
-    stage1_data = build_grpo_prompts(grpo_scenarios, ["easy", "medium"])
-    print(f"  Stage 1 prompts: {len(stage1_data)}")
-    FastLanguageModel.for_training(model)
-    grpo_s1 = CurriculumGRPOTrainer(
-        model=model,
-        tokenizer=tokenizer,
-        args=GRPOConfig(
-            output_dir="/tmp/vr_grpo_s1",
-            num_train_epochs=GRPO_STAGE1_EPOCHS,
-            per_device_train_batch_size=GRPO_BATCH_SIZE,
-            gradient_accumulation_steps=GRPO_GRAD_ACCUM,
-            num_generations=GRPO_NUM_GENERATIONS,
-            max_completion_length=384,
-            learning_rate=GRPO_LR,
-            lr_scheduler_type="cosine",
-            warmup_ratio=0.1,
-            beta=0.05,
-            max_grad_norm=0.5,
-            temperature=GRPO_TEMPERATURE,
-            logging_steps=1,
-            save_strategy="no",
-            bf16=True,
-            optim="adamw_8bit",
-            report_to="none",
-            remove_unused_columns=False,
-        ),
-        train_dataset=stage1_data,
-        reward_funcs=make_reward_fn(),
-    )
-    t0 = time.time()
-    grpo_s1.train()
-    print(f"  Stage 1 done in {time.time() - t0:.0f}s")
-    stage1_results = evaluate("Post-GRPO Stage 1", stages=["easy", "medium"])
-    # ── GRPO Stage 2: all difficulties ──
-    print("\n[7/7] GRPO Stage 2 (all difficulties)...")
-    stage2_data = build_grpo_prompts(grpo_scenarios, list(DIFFICULTIES))
-    print(f"  Stage 2 prompts: {len(stage2_data)}")
-    FastLanguageModel.for_training(model)
-    grpo_s2 = CurriculumGRPOTrainer(
-        model=model,
-        tokenizer=tokenizer,
-        args=GRPOConfig(
-            output_dir="/tmp/vr_grpo_s2",
-            num_train_epochs=GRPO_STAGE2_EPOCHS,
-            per_device_train_batch_size=GRPO_BATCH_SIZE,
-            gradient_accumulation_steps=GRPO_GRAD_ACCUM,
-            num_generations=GRPO_NUM_GENERATIONS,
-            max_completion_length=384,
-            learning_rate=GRPO_LR * 0.5,  # halved for stability with harder scenarios
-            lr_scheduler_type="cosine",
-            warmup_ratio=0.1,
-            beta=0.05,
-            max_grad_norm=0.5,
-            temperature=GRPO_TEMPERATURE,
-            logging_steps=1,
-            save_strategy="no",
-            bf16=True,
-            optim="adamw_8bit",
-            report_to="none",
-            remove_unused_columns=False,
-        ),
-        train_dataset=stage2_data,
-        reward_funcs=make_reward_fn(),
-    )
-    t0 = time.time()
-    grpo_s2.train()
-    print(f"  Stage 2 done in {time.time() - t0:.0f}s")
-    # ── Final eval + report ──
-    final_results = evaluate("Final (SFT + GRPO S1 + S2)")
-    print(f"\n{'=' * 72}")
-    print("  DELTA REPORT")
-    print(f"{'=' * 72}")
-    print(f"  {'Difficulty':<12} {'Baseline':>10} {'SFT':>10} {'GRPO-S1':>10} {'Final':>10} {'Δ':>10}")
-    print(f"  {'-'*12} {'-'*10} {'-'*10} {'-'*10} {'-'*10} {'-'*10}")
-    for diff in list(DIFFICULTIES) + ["overall"]:
-        b = baseline.get(diff, 0.0)
-        s = sft_results.get(diff, 0.0)
-        s1 = stage1_results.get(diff, 0.0)
-        f = final_results.get(diff, 0.0)
-        label = diff.upper() if diff == "overall" else diff
-        print(f"  {label:<12} {b:>10.3f} {s:>10.3f} {s1:>10.3f} {f:>10.3f} {f - b:>+10.3f}")
-    print(f"{'=' * 72}")
-    # ── Push to hub ──
-    if HUB_REPO:
-        print(f"\nPushing LoRA adapter to {HUB_REPO}...")
-        model.push_to_hub(HUB_REPO, token=HF_TOKEN)
-        tokenizer.push_to_hub(HUB_REPO, token=HF_TOKEN)
-        print(f"Pushed: https://huggingface.co/{HUB_REPO}")
-    else:
-        print("\nHUB_REPO not set — saving locally to /tmp/vr_qwen3_8b_lora")
-        model.save_pretrained("/tmp/vr_qwen3_8b_lora")
-        tokenizer.save_pretrained("/tmp/vr_qwen3_8b_lora")
-    total_mins = (time.time() - job_start) / 60
-    print(f"\nJob finished in {total_mins:.1f} minutes.")
-if __name__ == "__main__":
-    main()

uv.lock DELETED Viewed

The diff for this file is too large to render. See raw diff

viewer/audio_viewer.html DELETED Viewed

@@ -1,865 +0,0 @@
-<!doctype html>
-<html lang="en">
-<head>
-<meta charset="utf-8"/>
-<title>Visual Reasoning — Audio Viewer</title>
-<style>
-*, *::before, *::after { box-sizing: border-box; margin: 0; padding: 0; }
-:root {
-    --bg-dark: #1e1e2e;
-    --bg-panel: #181825;
-    --bg-surface: #11111b;
-    --border: #313244;
-    --text: #cdd6f4;
-    --text-dim: #6c7086;
-    --accent: #89b4fa;
-    --green: #a6e3a1;
-    --red: #f38ba8;
-    --yellow: #f9e2af;
-    --peach: #fab387;
-    --mauve: #cba6f7;
-    --teal: #94e2d5;
-}
-html, body { height: 100%; font-family: 'Inter', -apple-system, system-ui, sans-serif; background: var(--bg-dark); color: var(--text); overflow: hidden; }
-#app { display: flex; flex-direction: column; height: 100vh; }
-/* ---- Header ---- */
-.header {
-    display: flex; align-items: center; gap: 12px;
-    padding: 6px 14px; background: var(--bg-panel); border-bottom: 1px solid var(--border);
-    font-size: 12px; flex-shrink: 0; min-height: 36px;
-}
-.header .badge { padding: 2px 8px; border-radius: 4px; font-weight: 600; font-size: 11px; }
-.header .badge-task { background: rgba(137,180,250,0.15); color: var(--accent); }
-.header .badge-scenario { color: var(--text-dim); font-size: 11px; }
-.header .step-counter { color: var(--text); font-size: 11px; }
-.header .score-display { font-weight: 700; color: var(--green); font-size: 14px; }
-.header .spacer { flex: 1; }
-.header .ws-dot { width: 7px; height: 7px; border-radius: 50%; }
-.header .ws-dot.on { background: var(--green); }
-.header .ws-dot.off { background: var(--red); }
-.audio-btn {
-    background: none; border: 1px solid var(--border); color: var(--text-dim);
-    border-radius: 4px; padding: 2px 8px; font-size: 14px; cursor: pointer;
-    transition: all .2s;
-}
-.audio-btn:hover { border-color: var(--accent); color: var(--accent); }
-.audio-btn.active { color: var(--accent); border-color: var(--accent); }
-/* ---- Main layout ---- */
-.main { display: flex; flex: 1; min-height: 0; overflow: hidden; }
-/* ---- Canvas ---- */
-.canvas-wrap {
-    flex: 1; min-width: 0; background: var(--bg-surface);
-    position: relative; overflow: hidden;
-}
-.canvas-wrap canvas { display: block; width: 100%; height: 100%; }
-/* ---- Side panel ---- */
-.side {
-    width: 270px; flex-shrink: 0; background: var(--bg-panel);
-    border-left: 1px solid var(--border);
-    display: flex; flex-direction: column; overflow: hidden;
-}
-.side-section {
-    padding: 8px 12px; border-bottom: 1px solid var(--border); flex-shrink: 0;
-}
-.side-section h3 {
-    font-size: 9px; text-transform: uppercase; letter-spacing: 1.2px;
-    color: var(--accent); margin-bottom: 4px; font-weight: 600;
-}
-/* Goal */
-.goal-text {
-    font-size: 11px; line-height: 1.4; color: var(--text);
-    max-height: 48px; overflow: hidden; text-overflow: ellipsis;
-}
-/* Checklist */
-.checklist-grid { display: flex; flex-wrap: wrap; gap: 2px 8px; }
-.concept { display: flex; align-items: center; gap: 4px; padding: 1px 0; font-size: 11px; }
-.concept .icon { width: 14px; text-align: center; font-size: 12px; }
-.concept.covered .icon { color: var(--green); }
-.concept.uncovered .icon { color: var(--text-dim); }
-.concept.covered { color: var(--green); }
-.concept.uncovered { color: var(--text-dim); }
-/* Score bars */
-.score-row { display: flex; align-items: center; gap: 4px; margin-bottom: 2px; font-size: 10px; }
-.score-label { width: 78px; flex-shrink: 0; color: var(--text-dim); white-space: nowrap; overflow: hidden; text-overflow: ellipsis; }
-.score-track { flex: 1; height: 12px; background: var(--bg-surface); border-radius: 2px; overflow: hidden; }
-.score-fill { height: 100%; border-radius: 2px; display: flex; align-items: center; padding-left: 3px; font-size: 8px; font-weight: 600; color: #fff; min-width: 20px; transition: width .3s; }
-/* Narration log */
-.narration-log { flex: 1; overflow-y: auto; padding: 8px 12px; min-height: 0; }
-.narration-log h3 { font-size: 9px; text-transform: uppercase; letter-spacing: 1.2px; color: var(--accent); margin-bottom: 4px; font-weight: 600; }
-.narr-entry { padding: 4px 0; border-bottom: 1px solid rgba(49,50,68,.5); font-size: 11px; animation: fadeIn .3s; }
-.narr-entry .step-tag { font-weight: 600; color: var(--accent); font-size: 10px; }
-.narr-entry .reward { font-size: 9px; font-weight: 700; margin-left: 4px; }
-.narr-entry .reward.pos { color: var(--green); }
-.narr-entry .reward.neg { color: var(--red); }
-.narr-entry .text { color: var(--text); margin-top: 1px; line-height: 1.35; }
-/* ---- Bottom narration bar ---- */
-.narration-bar {
-    flex-shrink: 0; display: flex; align-items: center; gap: 10px;
-    padding: 8px 16px; background: var(--bg-panel); border-top: 1px solid var(--border);
-    min-height: 40px; max-height: 60px;
-}
-.narr-indicator {
-    width: 24px; height: 24px; display: flex; align-items: center; justify-content: center;
-    flex-shrink: 0; font-size: 16px; color: var(--text-dim); transition: color .3s;
-}
-.narr-indicator.speaking { color: var(--accent); }
-.narr-indicator.speaking .bars { display: flex; align-items: flex-end; gap: 2px; height: 16px; }
-.narr-indicator.speaking .bars span {
-    width: 3px; background: var(--accent); border-radius: 1px;
-    animation: barPulse .6s ease-in-out infinite alternate;
-}
-.narr-indicator.speaking .bars span:nth-child(2) { animation-delay: .15s; }
-.narr-indicator.speaking .bars span:nth-child(3) { animation-delay: .3s; }
-.narr-indicator.speaking .bars span:nth-child(4) { animation-delay: .45s; }
-.narr-text {
-    flex: 1; font-size: 13px; color: var(--text); font-style: italic;
-    white-space: nowrap; overflow: hidden; text-overflow: ellipsis;
-    transition: opacity .5s;
-}
-.narr-text.dim { opacity: .3; }
-.music-badge {
-    flex-shrink: 0; font-size: 10px; color: var(--mauve); display: flex; align-items: center; gap: 4px;
-    opacity: 0; transition: opacity .5s;
-}
-.music-badge.on { opacity: 1; }
-@keyframes barPulse {
-    0% { height: 4px; }
-    100% { height: 14px; }
-}
-/* ---- End overlay ---- */
-.overlay {
-    position: fixed; inset: 0; background: rgba(17,17,27,.92);
-    display: flex; flex-direction: column; align-items: center; justify-content: center;
-    z-index: 1000; cursor: pointer;
-}
-.overlay.hidden { display: none; }
-.overlay h1 { font-size: 24px; margin-bottom: 6px; }
-.overlay h1.ok { color: var(--green); }
-.overlay h1.fail { color: var(--red); }
-.overlay .final { font-size: 16px; color: var(--text-dim); margin-bottom: 16px; }
-.reward-chart { display: flex; align-items: flex-end; gap: 4px; height: 100px; }
-.reward-chart .bar { width: 22px; border-radius: 3px 3px 0 0; min-height: 2px; position: relative; }
-.reward-chart .bar .lbl { position: absolute; bottom: -14px; left: 50%; transform: translateX(-50%); font-size: 8px; color: var(--text-dim); }
-/* Audio hint banner */
-.audio-hint {
-    position: fixed; bottom: 60px; left: 50%; transform: translateX(-50%);
-    background: rgba(137,180,250,0.15); color: var(--accent); padding: 8px 20px;
-    border-radius: 8px; font-size: 13px; z-index: 2000; cursor: pointer;
-    border: 1px solid rgba(137,180,250,0.3); backdrop-filter: blur(8px);
-    animation: fadeIn .5s;
-}
-.audio-hint.hidden { display: none; }
-/* Error toast */
-.toast {
-    position: fixed; top: 46px; left: 50%; transform: translateX(-50%);
-    background: var(--red); color: #11111b; padding: 5px 16px;
-    border-radius: 6px; font-size: 11px; font-weight: 600; z-index: 999; display: none;
-}
-@keyframes fadeIn { from { opacity: 0; } to { opacity: 1; } }
-</style>
-</head>
-<body>
-<div id="app">
-    <div class="header">
-        <span class="badge badge-task" id="h-task">--</span>
-        <span class="badge-scenario" id="h-scenario">--</span>
-        <span class="step-counter" id="h-step">Step 0 / 0</span>
-        <span class="spacer"></span>
-        <span class="score-display" id="h-score">0.000</span>
-        <button class="audio-btn active" id="btn-mute" title="Toggle audio">&#x1f50a;</button>
-        <span class="ws-dot off" id="h-ws"></span>
-    </div>
-    <div class="main">
-        <div class="canvas-wrap">
-            <canvas id="canvas"></canvas>
-        </div>
-        <div class="side">
-            <div class="side-section" id="sec-goal"><h3>Goal</h3><div class="goal-text" id="goal-text">--</div></div>
-            <div class="side-section" id="sec-checklist"><h3>Concept Checklist</h3><div class="checklist-grid" id="checklist"></div></div>
-            <div class="side-section" id="sec-scores"><h3>Score Breakdown</h3><div id="scores"></div></div>
-            <div class="narration-log"><h3>Narration</h3><div id="narrations"></div></div>
-        </div>
-    </div>
-    <div class="narration-bar">
-        <div class="narr-indicator" id="narr-ind">
-            <div class="bars"><span></span><span></span><span></span><span></span></div>
-        </div>
-        <div class="narr-text dim" id="narr-text"></div>
-        <div class="music-badge" id="music-badge">&#x266b; music</div>
-    </div>
-</div>
-<!-- End overlay -->
-<div class="overlay hidden" id="overlay">
-    <h1 id="ov-title"></h1>
-    <div class="final" id="ov-score"></div>
-    <div class="reward-chart" id="ov-chart"></div>
-</div>
-<div class="audio-hint" id="audio-hint">Click here to enable audio narration</div>
-<div class="toast" id="toast"></div>
-<script>
-/* ================================================================
-   Audio Viewer — Canvas2D renderer + TTS + background music
-   ================================================================ */
-const ROLE_COLORS = {
-    default:   { fill: '#45475a', stroke: '#585b70', text: '#cdd6f4' },
-    current:   { fill: '#89b4fa', stroke: '#74c7ec', text: '#1e1e2e' },
-    visited:   { fill: '#a6e3a1', stroke: '#94e2d5', text: '#1e1e2e' },
-    frontier:  { fill: '#fab387', stroke: '#f9e2af', text: '#1e1e2e' },
-    done:      { fill: '#40a02b', stroke: '#a6e3a1', text: '#fff' },
-    pivot:     { fill: '#f38ba8', stroke: '#eba0ac', text: '#1e1e2e' },
-    root:      { fill: '#cba6f7', stroke: '#b4befe', text: '#1e1e2e' },
-    error:     { fill: '#f38ba8', stroke: '#f38ba8', text: '#fff' },
-    inactive:  { fill: '#313244', stroke: '#45475a', text: '#6c7086' },
-    comparing: { fill: '#f9e2af', stroke: '#fab387', text: '#1e1e2e' },
-};
-const GRID = 70;
-const NODE_R = 26;
-const ARROW_LEN = 10;
-const FONT = '13px Inter, system-ui, sans-serif';
-const FONT_BOLD = 'bold 14px Inter, system-ui, sans-serif';
-const FONT_SMALL = '10px Inter, system-ui, sans-serif';
-const FONT_ANN = '11px Inter, system-ui, sans-serif';
-let S = {
-    entities: {}, relations: [], layout: {}, annotations: [], notes: [],
-    taskName: '', scenarioId: '', goal: '', checklist: [], coverage: [],
-    maxSteps: 0, step: 0, score: 0, breakdown: {}, narrations: [],
-};
-let cam = { ox: 40, oy: 30, scale: 1 };
-let canvas, ctx;
-let toastTimer;
-const $ = id => document.getElementById(id);
-/* ---- Audio system ---- */
-let audioCtx = null;
-let bgMusic = null;
-let currentTTS = null;
-let isMuted = false;
-let audioUnlocked = false;
-let bgMusicWanted = false;
-let pendingAudioQueue = [];
-const BG_VOLUME = 0.06;
-const BG_DUCK_VOLUME = 0.02;
-// Create AudioContext eagerly (starts suspended until user gesture)
-try { audioCtx = new (window.AudioContext || window.webkitAudioContext)(); } catch {}
-bgMusic = new Audio('/audio/background.mp3');
-bgMusic.loop = true;
-bgMusic.volume = BG_VOLUME;
-bgMusic.addEventListener('error', () => console.warn('Background music not available'));
-function unlockAudio() {
-    if (audioUnlocked) return;
-    audioUnlocked = true;
-    $('audio-hint').classList.add('hidden');
-    if (audioCtx && audioCtx.state === 'suspended') audioCtx.resume();
-    // play+pause to unlock the HTML Audio element for later use
-    bgMusic.play().then(() => {
-        if (!bgMusicWanted) { bgMusic.pause(); bgMusic.currentTime = 0; }
-        else { bgMusic.volume = isMuted ? 0 : BG_VOLUME; $('music-badge').classList.add('on'); }
-    }).catch(() => {});
-    // flush queued TTS
-    if (pendingAudioQueue.length) {
-        for (const b64 of pendingAudioQueue) playTTS(b64);
-        pendingAudioQueue.length = 0;
-    }
-}
-// Unlock on any user interaction (click, touch, keydown)
-document.addEventListener('click', unlockAudio, { once: true });
-document.addEventListener('touchstart', unlockAudio, { once: true });
-document.addEventListener('keydown', unlockAudio, { once: true });
-function startBgMusic() {
-    bgMusicWanted = true;
-    if (!audioUnlocked || isMuted) return;
-    bgMusic.currentTime = 0;
-    bgMusic.volume = BG_VOLUME;
-    bgMusic.play().catch(() => {});
-    $('music-badge').classList.add('on');
-}
-function stopBgMusic() {
-    bgMusicWanted = false;
-    bgMusic.pause();
-    bgMusic.currentTime = 0;
-    $('music-badge').classList.remove('on');
-}
-function duckBgMusic() {
-    bgMusic.volume = isMuted ? 0 : BG_DUCK_VOLUME;
-}
-function unduckBgMusic() {
-    bgMusic.volume = isMuted ? 0 : BG_VOLUME;
-}
-async function playTTS(base64Wav) {
-    if (!base64Wav || !audioCtx || isMuted) return;
-    if (!audioUnlocked) { pendingAudioQueue.push(base64Wav); return; }
-    if (currentTTS) { try { currentTTS.stop(); } catch {} currentTTS = null; }
-    try {
-        if (audioCtx.state === 'suspended') await audioCtx.resume();
-        const binary = atob(base64Wav);
-        const buffer = new ArrayBuffer(binary.length);
-        const view = new Uint8Array(buffer);
-        for (let i = 0; i < binary.length; i++) view[i] = binary.charCodeAt(i);
-        const audioBuffer = await audioCtx.decodeAudioData(buffer);
-        const source = audioCtx.createBufferSource();
-        const gain = audioCtx.createGain();
-        gain.gain.value = 1.0;
-        source.buffer = audioBuffer;
-        source.connect(gain).connect(audioCtx.destination);
-        source.start();
-        currentTTS = source;
-        duckBgMusic();
-        $('narr-ind').classList.add('speaking');
-        source.onended = () => {
-            currentTTS = null;
-            unduckBgMusic();
-            $('narr-ind').classList.remove('speaking');
-            $('narr-text').classList.add('dim');
-        };
-    } catch (e) {
-        console.warn('TTS playback failed:', e);
-        $('narr-ind').classList.remove('speaking');
-    }
-}
-function toggleMute() {
-    isMuted = !isMuted;
-    const btn = $('btn-mute');
-    btn.textContent = isMuted ? '\u{1f507}' : '\u{1f50a}';
-    btn.classList.toggle('active', !isMuted);
-    bgMusic.volume = isMuted ? 0 : BG_VOLUME;
-    if (currentTTS && isMuted) { try { currentTTS.stop(); } catch {} currentTTS = null; }
-}
-/* ---- Coordinate helpers ---- */
-function wx(ex) { return cam.ox + ex * GRID * cam.scale; }
-function wy(ey) { return cam.oy + ey * GRID * cam.scale; }
-function pos(eid) {
-    const p = S.layout[eid];
-    if (!p) return null;
-    return { x: wx(Number(p.x)), y: wy(Number(p.y)) };
-}
-/* ---- Auto-fit camera ---- */
-function autoFit() {
-    const wrap = document.querySelector('.canvas-wrap');
-    const cw = wrap.clientWidth;
-    const ch = wrap.clientHeight;
-    let minX = Infinity, maxX = -Infinity, minY = Infinity, maxY = -Infinity;
-    let hasPoints = false;
-    for (const [eid, ent] of Object.entries(S.entities)) {
-        if (ent && ent.entity_type === 'region' && ent.bounds) {
-            minX = Math.min(minX, Number(ent.bounds.x0));
-            maxX = Math.max(maxX, Number(ent.bounds.x1));
-            minY = Math.min(minY, Number(ent.bounds.y0));
-            maxY = Math.max(maxY, Number(ent.bounds.y1));
-            hasPoints = true;
-        }
-        const p = S.layout[eid];
-        if (p) {
-            minX = Math.min(minX, Number(p.x));
-            maxX = Math.max(maxX, Number(p.x));
-            minY = Math.min(minY, Number(p.y));
-            maxY = Math.max(maxY, Number(p.y));
-            hasPoints = true;
-        }
-    }
-    if (!hasPoints) { cam = { ox: cw / 2, oy: ch / 2, scale: 1 }; return; }
-    if (minX === maxX && minY === maxY) {
-        cam.scale = 1.5;
-        cam.ox = cw / 2 - minX * GRID * cam.scale;
-        cam.oy = ch / 2 - minY * GRID * cam.scale;
-        return;
-    }
-    const spanX = (maxX - minX) || 1;
-    const spanY = (maxY - minY) || 1;
-    const pad = 50;
-    const sx = (cw - pad * 2) / (spanX * GRID);
-    const sy = (ch - pad * 2) / (spanY * GRID);
-    cam.scale = Math.min(sx, sy, 2.2);
-    const midX = (minX + maxX) / 2;
-    const midY = (minY + maxY) / 2;
-    cam.ox = cw / 2 - midX * GRID * cam.scale;
-    cam.oy = ch / 2 - midY * GRID * cam.scale;
-}
-/* ---- Drawing primitives ---- */
-function drawRoundRect(x, y, w, h, r) {
-    ctx.beginPath();
-    ctx.moveTo(x + r, y);
-    ctx.lineTo(x + w - r, y);
-    ctx.quadraticCurveTo(x + w, y, x + w, y + r);
-    ctx.lineTo(x + w, y + h - r);
-    ctx.quadraticCurveTo(x + w, y + h, x + w - r, y + h);
-    ctx.lineTo(x + r, y + h);
-    ctx.quadraticCurveTo(x, y + h, x, y + h - r);
-    ctx.lineTo(x, y + r);
-    ctx.quadraticCurveTo(x, y, x + r, y);
-    ctx.closePath();
-}
-function drawArrowhead(fx, fy, tx, ty) {
-    const angle = Math.atan2(ty - fy, tx - fx);
-    const s = ARROW_LEN * cam.scale;
-    ctx.beginPath();
-    ctx.moveTo(tx, ty);
-    ctx.lineTo(tx - s * Math.cos(angle - Math.PI / 7), ty - s * Math.sin(angle - Math.PI / 7));
-    ctx.lineTo(tx - s * Math.cos(angle + Math.PI / 7), ty - s * Math.sin(angle + Math.PI / 7));
-    ctx.closePath();
-    ctx.fill();
-}
-/* ---- Main render ---- */
-function render() {
-    const dpr = window.devicePixelRatio || 1;
-    const rect = canvas.getBoundingClientRect();
-    canvas.width = rect.width * dpr;
-    canvas.height = rect.height * dpr;
-    ctx.setTransform(dpr, 0, 0, dpr, 0, 0);
-    ctx.clearRect(0, 0, rect.width, rect.height);
-    const highlighted = new Set();
-    for (const a of S.annotations) {
-        if (a && a.text === '[highlight]') highlighted.add(a.target_id);
-    }
-    // Regions
-    for (const [eid, ent] of Object.entries(S.entities)) {
-        if (!ent || ent.entity_type !== 'region') continue;
-        const b = ent.bounds;
-        if (!b) continue;
-        const x0 = wx(Number(b.x0)), y0 = wy(Number(b.y0));
-        const x1 = wx(Number(b.x1)), y1 = wy(Number(b.y1));
-        ctx.save();
-        ctx.strokeStyle = 'rgba(137,180,250,0.2)';
-        ctx.lineWidth = 1;
-        ctx.setLineDash([6, 4]);
-        drawRoundRect(x0, y0, x1 - x0, y1 - y0, 8);
-        ctx.stroke();
-        ctx.setLineDash([]);
-        ctx.font = FONT_SMALL;
-        ctx.fillStyle = 'rgba(137,180,250,0.45)';
-        ctx.fillText(ent.title || eid, x0 + 6, y0 + 13);
-        ctx.restore();
-    }
-    // Edges
-    for (const rel of S.relations) {
-        if (!rel || !rel.src || !rel.dst) continue;
-        const sp = pos(rel.src), dp = pos(rel.dst);
-        if (!sp || !dp) continue;
-        const r = NODE_R * cam.scale;
-        const dx = dp.x - sp.x, dy = dp.y - sp.y;
-        const dist = Math.hypot(dx, dy) || 1;
-        const ux = dx / dist, uy = dy / dist;
-        const sx = sp.x + ux * r, sy = sp.y + uy * r;
-        const ex = dp.x - ux * r, ey = dp.y - uy * r;
-        ctx.save();
-        ctx.strokeStyle = '#585b70';
-        ctx.lineWidth = 1.5 * cam.scale;
-        ctx.beginPath();
-        ctx.moveTo(sx, sy);
-        ctx.lineTo(ex, ey);
-        ctx.stroke();
-        ctx.fillStyle = '#585b70';
-        drawArrowhead(sx, sy, ex, ey);
-        if (rel.label) {
-            const mx = (sx + ex) / 2, my = (sy + ey) / 2;
-            ctx.font = FONT_SMALL;
-            ctx.fillStyle = '#bac2de';
-            ctx.textAlign = 'center';
-            ctx.fillText(rel.label, mx, my - 6 * cam.scale);
-        }
-        ctx.restore();
-    }
-    // Entities
-    for (const [eid, ent] of Object.entries(S.entities)) {
-        if (!ent) continue;
-        const type = ent.entity_type || 'node';
-        if (type === 'region') continue;
-        const p = pos(eid);
-        if (!p) continue;
-        const role = ent.role || 'default';
-        const colors = ROLE_COLORS[role] || ROLE_COLORS.default;
-        const r = NODE_R * cam.scale;
-        const isHl = highlighted.has(eid);
-        ctx.save();
-        if (isHl) { ctx.shadowColor = '#f9e2af'; ctx.shadowBlur = 18 * cam.scale; }
-        if (type === 'pointer') {
-            ctx.beginPath();
-            ctx.moveTo(p.x - 8 * cam.scale, p.y - 10 * cam.scale);
-            ctx.lineTo(p.x + 12 * cam.scale, p.y);
-            ctx.lineTo(p.x - 8 * cam.scale, p.y + 10 * cam.scale);
-            ctx.closePath();
-            ctx.fillStyle = colors.fill;
-            ctx.fill();
-            ctx.strokeStyle = colors.stroke;
-            ctx.lineWidth = 1.5 * cam.scale;
-            ctx.stroke();
-            ctx.shadowBlur = 0;
-            ctx.font = FONT_SMALL;
-            ctx.fillStyle = '#cdd6f4';
-            ctx.textAlign = 'left';
-            const label = eid;
-            const val = ent.value != null ? ` -> ${ent.value}` : '';
-            ctx.fillText(label + val, p.x + 16 * cam.scale, p.y + 4 * cam.scale);
-        } else if (type === 'container') {
-            const w = 90 * cam.scale, h = 44 * cam.scale;
-            drawRoundRect(p.x - w / 2, p.y - h / 2, w, h, 6 * cam.scale);
-            ctx.fillStyle = 'rgba(69,90,100,0.25)';
-            ctx.fill();
-            ctx.strokeStyle = colors.stroke;
-            ctx.lineWidth = 1.5 * cam.scale;
-            ctx.setLineDash([4, 3]);
-            ctx.stroke();
-            ctx.setLineDash([]);
-            ctx.shadowBlur = 0;
-            ctx.font = FONT_SMALL;
-            ctx.fillStyle = '#89b4fa';
-            ctx.textAlign = 'center';
-            ctx.fillText(ent.title || eid, p.x, p.y - h / 2 - 5 * cam.scale);
-            const contents = (ent.contents || []).join(', ');
-            if (contents) {
-                ctx.font = FONT_SMALL;
-                ctx.fillStyle = '#cdd6f4';
-                ctx.fillText(contents, p.x, p.y + 3 * cam.scale);
-            }
-        } else {
-            ctx.beginPath();
-            ctx.arc(p.x, p.y, r, 0, Math.PI * 2);
-            ctx.fillStyle = colors.fill;
-            ctx.fill();
-            ctx.strokeStyle = colors.stroke;
-            ctx.lineWidth = 2 * cam.scale;
-            ctx.stroke();
-            if (role === 'error') {
-                ctx.beginPath();
-                ctx.moveTo(p.x - r * .6, p.y - r * .6);
-                ctx.lineTo(p.x + r * .6, p.y + r * .6);
-                ctx.strokeStyle = '#fff';
-                ctx.lineWidth = 2 * cam.scale;
-                ctx.stroke();
-            }
-            ctx.shadowBlur = 0;
-            const valText = ent.value != null ? String(ent.value) : eid;
-            ctx.font = FONT_BOLD;
-            ctx.fillStyle = colors.text;
-            ctx.textAlign = 'center';
-            ctx.textBaseline = 'middle';
-            ctx.fillText(valText.length > 6 ? valText.slice(0, 6) : valText, p.x, p.y);
-            ctx.textBaseline = 'alphabetic';
-            if (role !== 'default') {
-                ctx.font = FONT_SMALL;
-                ctx.fillStyle = colors.stroke;
-                ctx.fillText(role, p.x, p.y + r + 12 * cam.scale);
-            }
-        }
-        ctx.restore();
-    }
-    // Annotations
-    const annMap = new Map();
-    for (const a of S.annotations) {
-        if (!a || a.text === '[highlight]' || a.text === '[popped]') continue;
-        annMap.set(a.target_id, a.text);
-    }
-    for (const [tid, text] of annMap) {
-        const p = pos(tid);
-        if (!p) continue;
-        const r = NODE_R * cam.scale;
-        ctx.save();
-        ctx.font = FONT_ANN;
-        const tw = ctx.measureText(text).width;
-        const px = p.x - tw / 2 - 4, py = p.y - r - 14 * cam.scale;
-        drawRoundRect(px, py - 10, tw + 8, 16, 4);
-        ctx.fillStyle = 'rgba(250,179,135,0.15)';
-        ctx.fill();
-        ctx.strokeStyle = 'rgba(250,179,135,0.4)';
-        ctx.lineWidth = 0.5;
-        ctx.stroke();
-        ctx.fillStyle = '#fab387';
-        ctx.textAlign = 'center';
-        ctx.fillText(text, p.x, py);
-        ctx.restore();
-    }
-    // Notes
-    for (const note of S.notes) {
-        if (!note || !note.text) continue;
-        let nx = 16, ny = rect.height - 24;
-        const regionEnt = note.region && S.entities[note.region];
-        if (regionEnt && regionEnt.bounds) {
-            nx = wx(Number(regionEnt.bounds.x0)) + 6;
-            ny = wy(Number(regionEnt.bounds.y1)) - 6;
-        }
-        ctx.save();
-        ctx.font = FONT_SMALL;
-        const t = note.text.length > 50 ? note.text.slice(0, 50) + '…' : note.text;
-        ctx.fillStyle = 'rgba(108,112,134,0.7)';
-        ctx.textAlign = 'left';
-        ctx.fillText(t, nx, ny);
-        ctx.restore();
-    }
-}
-/* ---- UI updates ---- */
-function barColor(v) { return v >= .7 ? 'var(--green)' : v >= .4 ? 'var(--yellow)' : 'var(--red)'; }
-function penaltyColor(v) { return v <= 0 ? 'var(--green)' : v < .1 ? 'var(--yellow)' : 'var(--red)'; }
-function updateHeader() {
-    $('h-task').textContent = S.taskName || '--';
-    $('h-scenario').textContent = S.scenarioId || '';
-    $('h-step').textContent = `Step ${S.step} / ${S.maxSteps}`;
-    $('h-score').textContent = S.score.toFixed(3);
-}
-function updateChecklist() {
-    const el = $('checklist');
-    el.innerHTML = '';
-    for (const c of S.checklist) {
-        const ok = S.coverage.includes(c);
-        el.innerHTML += `<div class="concept ${ok ? 'covered' : 'uncovered'}"><span class="icon">${ok ? '✓' : '○'}</span>${c}</div>`;
-    }
-}
-function updateScores() {
-    const el = $('scores');
-    el.innerHTML = '';
-    const bd = S.breakdown || {};
-    const allKeys = Object.keys(bd).filter(k => k !== 'overall_score' && k !== 'phase' && typeof bd[k] === 'number');
-    const subKeys = allKeys.filter(k => !k.startsWith('penalty_'));
-    const penKeys = allKeys.filter(k => k.startsWith('penalty_'));
-    for (const k of subKeys) {
-        const v = Number(bd[k]) || 0;
-        const pct = Math.max(0, Math.min(100, v * 100));
-        const label = k.replace(/_score$/, '').replace(/_/g, ' ');
-        el.innerHTML += `<div class="score-row"><span class="score-label">${label}</span><div class="score-track"><div class="score-fill" style="width:${pct}%;background:${barColor(v)}">${v.toFixed(2)}</div></div></div>`;
-    }
-    if (penKeys.length) {
-        el.innerHTML += `<div style="margin-top:4px;margin-bottom:3px;font-size:9px;text-transform:uppercase;letter-spacing:1px;color:var(--red);font-weight:600">Penalties (lower = better)</div>`;
-        for (const k of penKeys) {
-            const v = Number(bd[k]) || 0;
-            const pct = Math.max(0, Math.min(100, v * 100));
-            const label = k.replace(/^penalty_/, '').replace(/_/g, ' ');
-            el.innerHTML += `<div class="score-row"><span class="score-label">${label}</span><div class="score-track"><div class="score-fill" style="width:${pct}%;background:${penaltyColor(v)}">${v.toFixed(2)}</div></div></div>`;
-        }
-    }
-    if (bd.overall_score != null) {
-        const v = Number(bd.overall_score);
-        const pct = Math.max(0, Math.min(100, v * 100));
-        el.innerHTML += `<div class="score-row" style="margin-top:3px;border-top:1px solid var(--border);padding-top:3px"><span class="score-label" style="font-weight:700">overall</span><div class="score-track"><div class="score-fill" style="width:${pct}%;background:${barColor(v)}">${v.toFixed(3)}</div></div></div>`;
-    }
-}
-function addNarration(step, text, reward) {
-    const el = $('narrations');
-    const cls = reward >= 0 ? 'pos' : 'neg';
-    const sign = reward >= 0 ? '+' : '';
-    el.innerHTML += `<div class="narr-entry"><span class="step-tag">Step ${step}</span><span class="reward ${cls}">${sign}${reward.toFixed(2)}</span><div class="text">${text || ''}</div></div>`;
-    el.parentElement.scrollTop = el.parentElement.scrollHeight;
-}
-function showNarration(text) {
-    const el = $('narr-text');
-    el.textContent = text || '';
-    el.classList.remove('dim');
-}
-function showToast(text) {
-    const el = $('toast');
-    el.textContent = text;
-    el.style.display = 'block';
-    clearTimeout(toastTimer);
-    toastTimer = setTimeout(() => el.style.display = 'none', 4000);
-}
-/* ---- Message handlers ---- */
-function onClear() {
-    S.entities = {}; S.relations = []; S.layout = {};
-    S.annotations = []; S.notes = [];
-    S.taskName = ''; S.scenarioId = ''; S.goal = '';
-    S.checklist = []; S.coverage = [];
-    S.maxSteps = 0; S.step = 0; S.score = 0;
-    S.breakdown = {}; S.narrations = [];
-    updateHeader(); updateChecklist(); updateScores();
-    $('narrations').innerHTML = '';
-    $('goal-text').textContent = '--';
-    $('overlay').classList.add('hidden');
-    showNarration('');
-    stopBgMusic();
-    autoFit(); render();
-}
-function onReset(m) {
-    $('overlay').classList.add('hidden');
-    S.taskName = m.task_name || '';
-    S.scenarioId = m.scenario_id || '';
-    S.goal = m.goal || '';
-    S.checklist = m.checklist || [];
-    S.coverage = m.coverage || [];
-    S.maxSteps = m.max_steps || 0;
-    S.step = 0;
-    S.score = 0;
-    S.breakdown = m.score_breakdown || {};
-    S.entities = m.entities || {};
-    S.relations = m.relations || [];
-    S.layout = m.layout || {};
-    S.annotations = m.annotations || [];
-    S.notes = m.notes || [];
-    S.narrations = [];
-    updateHeader();
-    updateChecklist();
-    updateScores();
-    $('narrations').innerHTML = '';
-    $('goal-text').textContent = S.goal;
-    showNarration(S.goal);
-    autoFit();
-    render();
-    startBgMusic();
-    if (m.audio) playTTS(m.audio);
-}
-function onStep(m) {
-    S.step = m.step || S.step;
-    S.score = m.score != null ? m.score : S.score;
-    S.breakdown = m.score_breakdown || S.breakdown;
-    S.coverage = m.coverage || S.coverage;
-    S.entities = m.entities || S.entities;
-    S.relations = m.relations || S.relations;
-    S.layout = m.layout || S.layout;
-    S.annotations = m.annotations || S.annotations;
-    S.notes = m.notes || S.notes;
-    S.maxSteps = S.step + (m.remaining_step_budget || 0);
-    updateHeader();
-    updateChecklist();
-    updateScores();
-    autoFit();
-    render();
-    addNarration(m.step, m.narration || '', m.reward || 0);
-    showNarration(m.narration || '');
-    if (m.error) showToast(m.error);
-    if (m.audio) playTTS(m.audio);
-}
-function onEnd(m) {
-    stopBgMusic();
-    const ov = $('overlay');
-    ov.classList.remove('hidden');
-    $('ov-title').textContent = `Episode Done — ${m.task_name || ''}`;
-    $('ov-title').className = m.success ? 'ok' : 'fail';
-    $('ov-score').textContent = `Score: ${(m.score || 0).toFixed(3)}  |  Steps: ${m.steps || 0}`;
-    const chart = $('ov-chart');
-    chart.innerHTML = '';
-    const rr = m.rewards || [];
-    const mx = Math.max(0.01, ...rr.map(r => Math.abs(r)));
-    for (let i = 0; i < rr.length; i++) {
-        const r = rr[i];
-        const h = Math.max(2, (Math.abs(r) / mx) * 100);
-        chart.innerHTML += `<div class="bar" style="height:${h}px;background:${r >= 0 ? 'var(--accent)' : 'var(--red)'}"><span class="lbl">${i + 1}</span></div>`;
-    }
-    ov.onclick = () => ov.classList.add('hidden');
-}
-/* ---- Long-poll (single pending request, no spam) ---- */
-let pollCursor = 0;
-let pollRunning = false;
-async function pollLoop() {
-    if (pollRunning) return;
-    pollRunning = true;
-    while (true) {
-        try {
-            const resp = await fetch(`/poll?since=${pollCursor}`);
-            if (!resp.ok) {
-                $('h-ws').className = 'ws-dot off';
-                await new Promise(r => setTimeout(r, 2000));
-                continue;
-            }
-            $('h-ws').className = 'ws-dot on';
-            const data = await resp.json();
-            pollCursor = data.next;
-            for (const m of (data.messages || [])) {
-                if (m.type === 'clear') onClear();
-                else if (m.type === 'reset') onReset(m);
-                else if (m.type === 'step') onStep(m);
-                else if (m.type === 'end') onEnd(m);
-                else if (m.type === 'shutdown') { stopBgMusic(); showNarration('All episodes complete.'); }
-            }
-        } catch {
-            $('h-ws').className = 'ws-dot off';
-            await new Promise(r => setTimeout(r, 2000));
-        }
-    }
-}
-function connect() {
-    pollCursor = 0;
-    pollLoop();
-}
-/* ---- Mute button ---- */
-$('btn-mute').addEventListener('click', toggleMute);
-/* ---- Init ---- */
-function init() {
-    canvas = $('canvas');
-    ctx = canvas.getContext('2d');
-    window.addEventListener('resize', () => { autoFit(); render(); });
-    autoFit();
-    render();
-    connect();
-}
-init();
-</script>
-</body>
-</html>