Spaces:

abrown31
/

open-range

Runtime error

Aaron Brown commited on Mar 8

Commit

3ea4118

1 Parent(s): f549fda

Cleanup: fix bugs, remove dead code, add missing packages

- Add openenv fallback stubs in client.py (matches models.py pattern)
- Fix auth command parsing with maxsplit=3 (passwords with spaces)
- Fix reward exception silencing: log at ERROR with traceback
- Add snapshot validation: default flags/topology/task if None
- Fix shell injection in file deploy with shlex.quote()
- Remove async anti-pattern in rollout.py
- Remove duplicate src/open_range/server/Dockerfile
- Remove unused requests dependency
- Remove redundant uv install check from Dockerfile
- Add missing packages: open_range.agents, open_range.validator

Files changed (19) hide show

AGENTS.md +722 -0
README.md +9 -1
openenv.yaml +0 -2
pyproject.toml +25 -11
server/Dockerfile +33 -14
server/__init__.py +2 -2
server/app.py +9 -3
src/open_range/builder/builder.py +366 -77
src/open_range/cli.py +438 -0
src/open_range/client/client.py +30 -3
src/open_range/server/Dockerfile +0 -44
src/open_range/server/app.py +3 -0
src/open_range/server/environment.py +39 -29
src/open_range/training/rollout.py +5 -7
tests/test_apply_snapshot.py +457 -0
tests/test_console.py +40 -26
tests/test_parse_llm_response.py +1075 -0
tests/test_renderer_integration.py +373 -0
uv.lock +48 -46

AGENTS.md ADDED Viewed

	@@ -0,0 +1,722 @@

+# AGENTS.md
+Guidance for Codex when working on OpenRange.
+## What Is OpenRange
+OpenRange is a **multi-agent cybersecurity gymnasium** built on OpenEnv 0.2.1. It is the first cybersecurity environment in the OpenEnv ecosystem.
+Three LLM roles operate on real Docker infrastructure:
+| Role | Entry Point | What It Does |
+|------|-------------|--------------|
+| **Builder** (`pi_build`) | YAML manifest | Generates Dockerfiles, docker-compose, configs with planted vulns. Runs NPC traffic. Evolves range via curriculum. |
+| **Red** (`pi_red`) | External (no access) | Attacks live containers. Rewards: flag capture, efficiency, stealth, evidence quality, anti-hallucination. |
+| **Blue** (`pi_blue`) | Internal (monitor host) | Defends via log analysis, patching, firewalling. Rewards: detection rate, patch validity, availability, FP penalty. |
+Red and Blue train **in tandem** — both agents active on the same range simultaneously.
+Red's stealth reward is coupled to Blue's detection, creating adversarial co-evolution.
+A **golden path** (the answer key) validates every generated range before training begins.
+The golden path is generated by the Builder LLM and reviewed by the Validator LLM.
+## Architecture (5 Layers)
+```
+Layer 1: YAML Manifest (human-authored topology, vulns, golden path, escalation rules)
+    |
+Layer 2: Builder Agent (YAML -> Dockerfiles, compose, configs, NPC scripts -> docker compose up)
+    |
+Layer 3: Validator (10-check admission pipeline: 8 mechanical + 2 LLM advisory)
+    |
+Layer 4: OpenEnv Server (FastAPI on HF Spaces: /reset, /step, /state) + Red/Blue Operators
+    |
+Layer 5: Training (TRL GRPOTrainer + Unsloth QLoRA) + Curriculum (escalate -> mutated YAML' -> back to Layer 1)
+```
+## Reset = Mutation (Critical Design)
+**`reset()` does NOT restart the same environment.** It selects a different pre-validated
+snapshot with different vulnerabilities. Example: a web app had XSS on episode N; after reset,
+episode N+1 uses a snapshot with IDOR instead. The topology stays the same but the planted
+vulnerabilities, flags, and golden path change.
+This means the agent **cannot memorize** a fixed exploit chain. It must learn to **generalize**
+across vulnerability classes.
+### Snapshot Generation (Async, Between Episodes)
+```
+Builder LLM called asynchronously (background queue, NOT in reset() hot path)
+    |
+    v
+Builder LLM generates new snapshot as STRUCTURED JSON (not prose — SWE-RL lesson):
+  - Same SnapshotBuilder protocol, different BuildContext (episode history, solve rates, weak areas)
+  - Outputs formal spec: {topology, truth_graph, vulns, golden_path, evidence_spec,
+    npc_personas, task briefings}
+  - Thin template layer renders JSON spec → actual config files (PHP, nginx.conf, etc.)
+  - This separates LLM reasoning (creative) from file formatting (mechanical)
+    |
+    v
+Partial container restart (hot-swap modified files, restart affected services)
+    |
+    v
+10-Check Validator Admission Pipeline (per R2E-Gym + SWE-RL lessons):
+  Mechanical checks (deterministic, no LLM):
+    1. Build + boot: docker compose up + healthchecks (all containers, all ports)
+    2. Exploitability: golden path end-to-end (each step produces expect_stdout)
+    3. Patchability: inverse mutation test — revert each vuln, its golden path step MUST fail
+    4. Evidence sufficiency: logs + SIEM alerts exist for Blue investigation
+    5. Reward grounding: rubrics produce valid scores against known scenarios
+    6. Isolation + leakage: zones enforced, no flag values in briefings
+    7. Task feasibility: tasks reference real reachable hosts, services, logs
+    8. Difficulty calibration: golden path steps within ±20% of tier target
+  LLM checks (configurable, removable):
+    9. NPC consistency: personas respond per security_awareness (LLM tests NPCs)
+   10. Realism review: scenario plausibility + briefing leakage (LLM advisory only)
+    |
+    v
+PASS -> store in Snapshot Store (frozen, immutable, ready for reset())
+FAIL -> Builder LLM receives error context, retries (max 3)
+```
+### Reset Flow (Fast — Draws From Pool)
+```
+reset() called by training orchestration
+    |
+    v
+Select pre-validated snapshot from Snapshot Store
+  (strategy: latest, random, or curriculum_weighted)
+    |
+    v
+Boot or restore snapshot containers from frozen Docker artifacts
+    |
+    v
+Return initial RangeObservation with challenge briefing
+  (Red briefing: tiered by difficulty. Blue briefing: always minimal.)
+```
+### Why LLM-Based (Not Templates)
+Templates produce predictable, shallow vulnerabilities. An LLM Builder can:
+- **Compose novel vuln chains**: SSRF to access internal DB, then SQLi on internal endpoint
+- **Vary attack surfaces creatively**: Different URL structures, parameter names, auth flows each episode
+- **Generate realistic code**: Vulnerable PHP/Python/Node apps that look like real software, not CTF toy examples
+- **Adapt to agent behavior**: If Red consistently solves SQLi easily, Builder can plant harder variants or combine with WAF rules
+The Validator LLM closes the loop: it reviews the Builder's output to ensure the challenge is
+actually solvable, properly difficult, and that the description doesn't leak the answer.
+### Verifiable Rewards
+Despite LLM-based generation, **all rewards are grounded in verifiable container state**:
+- Flag: `docker exec cat <path>` — binary match, not LLM-judged
+- Patch validity: Re-run golden path exploit command — if it fails, patch worked
+- Service availability: HTTP healthcheck — binary up/down
+- Hallucination: Compare submitted flag against manifest-defined flags — exact string match
+The LLM generates the challenge. The rewards are verified by code execution against real infrastructure.
+### Challenge Diversity (Black-Box Agents)
+Agents operate **black-box** — they see briefings and environment outputs, never the truth graph.
+**Red briefing** is tiered by difficulty:
+- Tier 1: topology + vague hint ("web application with database backend, find vulnerabilities")
+- Tier 2: topology only ("corporate network, 10 hosts, find and exploit")
+- Tier 3+: minimal ("enterprise network, go") — forces pure recon
+**Blue briefing** is always minimal: "monitor SIEM for suspicious activity, investigate, respond."
+Blue never knows what vulnerability class was planted.
+**Episode diversity** prevents memorization:
+- Must NOT repeat same vuln class within last 3 episodes
+- Must NOT reuse same injection point within last 5 episodes
+- Vary approach even within same vuln class (SQLi in search vs login vs API)
+- Language/framing of briefings varies each episode
+**Progression** builds naturally via curriculum:
+- Early: single-vuln, direct exploit (SQLi → flag)
+- Mid: multi-vuln chains (IDOR → cred leak → DB access)
+- Late: multi-host pivots (web → internal → management → flag)
+- Driven by solve rates, not hardcoded episode numbers
+### Red + Blue Tandem RL (Core Design)
+**Both offensive and defensive agents train in tandem, not sequentially.**
+```
+Episode N:
+  Builder LLM generates mutated range (new vulns, new golden path)
+  Validator LLM + scripted checks confirm range is valid
+  |
+  Red acts: nmap -> discover services -> exploit vuln -> capture flag
+  |  (Red's actions appear in container logs in real time)
+  |
+  Blue observes: log stream = NPC noise + Red's real attack actions
+  Blue acts: analyze logs -> identify attack -> patch/block -> submit findings
+  |
+  Rewards computed:
+    Red: flag + efficiency + stealth(did Blue detect?) + anti-hallucination
+    Blue: detection(did Blue catch Red?) + patch(did patch block exploit?) + availability + FP penalty
+  |
+  Both rewards feed back to their respective GRPO trainers
+```
+**Key coupling**: Red's stealth reward depends on Blue's detection. Blue's detection reward
+depends on Red's actions. This creates an adversarial co-evolution:
+- Red learns to be stealthier -> Blue must learn better detection
+- Blue learns to detect faster -> Red must learn new evasion techniques
+This is NOT self-play (single model playing both roles). It's **two separate policies** trained
+against shared infrastructure with coupled reward signals.
+### Vulnerability Classes (Examples)
+| OWASP | Class | Example | Scope |
+|-------|-------|---------|-------|
+| A01 | IDOR | Sequential user IDs without authz | web API |
+| A01 | Path Traversal | `file=` param without sanitization | web |
+| A01 | LFI | `include($_GET['page'])` → server files | web |
+| A01 | RFI | Remote file include → code execution | web (Tier 2+) |
+| A01 | Missing Authz | Unprotected admin endpoint | web |
+| A03 | SQLi | Unsanitized query parameter | web → db |
+| A03 | XSS | Comment form → admin session hijack | web |
+| A03 | Command Injection | User input to `os.system()` | web → shell |
+| A03 | LDAP Injection | Unsanitized LDAP bind/search | web → ldap |
+| A03 | SSTI | Template injection → RCE | web |
+| A03 | XXE | XML external entity → file read / SSRF | web |
+| A04 | File Upload | Unrestricted upload → webshell | web |
+| A05 | Service Misconfig | Debug endpoints, default configs | any host |
+| A07 | Weak Creds | Default passwords | SSH, DB, LDAP, SMB |
+| A07 | Broken Auth | JWT `alg:none`, session fixation | web |
+| A07 | Credential Reuse | Same password → lateral movement | cross-service |
+| A07 | Kerberoasting | Kerberos ticket attacks | ldap (Tier 3+) |
+| A08 | RCE | `eval()`, pickle, code injection | web → shell |
+| A08 | Deserialization | Insecure deserialization | web |
+| A10 | SSRF | URL fetch hitting internal services | web → internal |
+| Infra | SMB Misconfig | Guest access, null sessions | files |
+| Infra | Mail Misconfig | Open relay, missing SPF/DKIM | mail |
+| Infra | Firewall Bypass | Zone traversal, rule gaps | firewall |
+| Infra | SSH Key Exposure | Private keys readable | any host (Tier 2+) |
+| Ops | Config Drift | Stale config diverged from intended | any host |
+| Ops | Orphaned Access | Departed staff accounts | ldap |
+| Ops | Data Exposure | Creds in backups, logs, configs | any host |
+| T3+ | CI/CD Poisoning | Pipeline injection | ci_cd (Tier 3+) |
+| T3+ | Supply Chain | Dependency confusion | ci_cd (Tier 3+) |
+| Chain | Multi-host | SSRF → internal SQLi → flag in DB | cross-zone |
+| Chain | Lateral | Credential reuse → SSH pivot → LDAP dump | cross-service |
+### Implications for Training
+- **Reset latency**: LLM generation (~10-20s) + container update (~10-20s) + LLM validation (~10-15s) + scripted validation (~5-10s) = ~35-65s per reset
+- **GRPO batching**: All `num_generations` in a batch share the SAME mutated range (reset once per batch, not per generation)
+- **Episode diversity**: LLM generates genuinely novel challenges each reset — not cycling through fixed templates
+- **Container cleanup**: After each episode, dirty state cleaned by restarting affected service containers
+- **Tandem training**: Red and Blue GRPO trainers can run on same or different GPUs, sharing environment
+- **Curriculum**: As both agents improve, Builder LLM generates harder challenges (more hosts, chained vulns, stealthier golden paths)
+## Lessons from Research (R2E-Gym, Self-Play SWE-RL)
+These papers directly inform OpenRange's design. Violating these lessons risks repeating known failures.
+### From R2E-Gym (Procedural Environments + Hybrid Verifiers)
+1. **Hybrid verification is non-negotiable.** Execution-based verification alone plateaus at ~43%. LLM-based verification alone plateaus at ~43%. Combined: 51%. OpenRange's Validator MUST use both LLM review AND scripted golden-path execution.
+2. **Synthetic task generation equals human quality.** LLM-generated task descriptions perform identically to human-written ones (27.8% vs 28.0%). Builder LLM generating cyber challenges from vulnerability catalogs is a validated approach.
+3. **Toxic tests are real.** Up to 10% of generated validations incorrectly favor wrong solutions. Track Validator false-positive rate (accepting broken ranges) and false-negative rate (rejecting valid ranges).
+4. **Include reasoning traces in training data.** SFT with agent thought processes improves downstream performance by +3.8%. Red and Blue training trajectories MUST include structured reasoning (recon plan → vuln hypothesis → exploit attempt → verification), not just raw commands.
+5. **Build environment creation is the hardest part.** Docker dependency resolution, service connectivity, and reproducibility dominate engineering effort. Pre-build base images extensively.
+### From Self-Play SWE-RL (Adversarial Self-Improvement)
+6. **Formal specifications beat natural language.** Their biggest failed experiment: generating NL issue descriptions. A 32B model produced incoherent, repetitive text. They succeeded with formal test specs. **Builder LLM should output structured JSON specs** (vuln_type, injection_point, golden_path_commands, flag_location), NOT prose. The challenge description for the AGENT can be NL, but the Builder's internal output must be formal.
+7. **Builder reward: `r_inject = 1 - (1+α)·s`** where s = solve rate, α = 0.8. Penalizes too-easy challenges (s→1) and too-hard/impossible ones (s→0). Rewards challenges at the frontier of the agent's current ability. This naturally creates curriculum without manual difficulty design.
+8. **7-check consistency validation with inverse mutation testing.** Every generated range must pass:
+   - Services exist and respond
+   - Flags are accessible at expected locations
+   - Vulnerability is actually exploitable (golden path succeeds)
+   - Network isolation holds
+   - Difficulty matches target
+   - Challenge description doesn't leak the answer
+   - **Inverse mutation test**: for each planted vuln, removing ONLY that vuln must cause the golden path to fail at the corresponding step. This verifies each vuln actually contributes to the challenge.
+9. **Higher-order challenges from failed attempts.** When Blue fails to patch a vuln, the resulting state (partial patch + remaining vuln) becomes a harder challenge for the next episode. When Red fails to exploit, the failed attempt reveals what didn't work, informing the Builder to create challenges that specifically test that weakness.
+10. **Collapse risks in adversarial training.** A sufficiently capable Red agent can learn dominant strategies (e.g., obfuscation, always-same-attack) that stall Blue learning. Mitigations: ground in real-world data (real CVE patterns), limit divergence from realistic attack patterns, don't let Red game the reward through unrealistic strategies.
+11. **SFT before RL is critical.** Both papers use SFT on expert trajectories first, then RL. Never start GRPO from a cold model — always warm-start with supervised fine-tuning on successful attack/defense traces.
+12. **Binary reward for solver, nuanced reward for generator.** Red/Blue can use binary rewards (flag found or not, attack detected or not). The Builder needs the frontier-calibrating `r_inject` reward to learn optimal difficulty.
+## Key Invariants
+- **Golden path gates training**: No episode runs on unvalidated infrastructure. Validator must PASS all 10 admission checks (8 mechanical + 2 LLM).
+- **Rewards are grounded**: Every reward signal verified against golden-path-validated container state (flags via `docker exec`, patches via re-running exploit chain).
+- **Anti-hallucination**: Flag submissions checked against manifest-defined flags. Fake flags penalized at -0.3.
+- **Agents cannot reset**: Only training orchestration controls episode lifecycle (inherited from OpenEnv).
+- **Horizontal growth, not vertical**: Difficulty increases by adding hosts/networks/services, not just harder passwords.
+- **NPC noise is mandatory for Blue**: Without background traffic, detection is trivial and stealth is meaningless. NPCs evolve from shell-script noise (Level 0) to LLM-driven personas with susceptibility profiles (Level 1+), creating a social engineering attack surface.
+- **Client-server separation**: Follows OpenEnv pattern — clients never import from `server/`.
+## Directory Structure
+```
+open-range/
+├── AGENTS.md                    # This file
+├── IMPLEMENTATION_PLAN.md       # Build plan, testing, open questions
+├── manifests/                   # YAML range manifests (human-authored)
+│   ├── schema.yaml              # JSON Schema for manifest validation
+│   ├── tier1_basic.yaml         # 8-host enterprise, ~8 golden path steps
+│   ├── tier2_corporate.yaml     # 10-12 host, ~15 golden path steps
+│   └── tier3_enterprise.yaml    # 14-18 host, ~25 golden path steps
+├── protocols.py                 # Agent protocols (SnapshotBuilder, NPCBehavior, ValidatorCheck)
+├── resolve.py                   # Dynamic component resolution (importlib + Protocol check)
+├── builder/                     # Builder agent (Layer 2)
+│   ├── builder.py               # LLMSnapshotBuilder + TemplateOnlyBuilder + FileBuilder
+│   ├── mutator.py               # Vuln mutation logic (swap vulns between resets)
+│   ├── templates/               # Jinja2 templates for Dockerfiles, configs
+│   └── npc/                     # NPC system (Level 0: shell scripts, Level 1: LLM personas)
+│       ├── npc_manager.py       # Orchestrator: starts scripts + LLM agents per snapshot
+│       ├── persona.py           # Pydantic NPC persona model (security_awareness, susceptibility)
+│       ├── npc_agent.py         # Async LLM NPC agent loop (email check, decide, act)
+│       ├── http_traffic.sh      # Level 0: curl loops
+│       ├── smtp_traffic.sh      # Level 0: email noise
+│       └── *.sh                 # Level 0: other service traffic scripts
+├── validator/                   # Golden path validator (Layer 3) — 10-check admission pipeline
+│   ├── validator.py             # Validator pipeline (runs list of ValidatorCheck protocols)
+│   ├── build_boot.py            # Check 1: docker compose up + healthchecks (mechanical)
+│   ├── exploitability.py        # Check 2: golden path end-to-end (mechanical)
+│   ├── patchability.py          # Check 3: inverse mutation test (mechanical)
+│   ├── evidence.py              # Check 4: logs + alerts exist (mechanical)
+│   ├── reward_grounding.py      # Check 5: rubrics produce valid scores (mechanical)
+│   ├── isolation.py             # Check 6: zones enforced, no leaks (mechanical)
+│   ├── task_feasibility.py      # Check 7: tasks reference real reachable hosts/services/logs (mechanical)
+│   ├── difficulty.py            # Check 8: golden path steps within ±20% of tier target (mechanical)
+│   ├── npc_consistency.py       # Check 9: NPC personas respond per security_awareness (LLM)
+│   └── realism_review.py        # Check 10: scenario plausibility + briefing leakage (LLM, advisory)
+├── server/                      # OpenEnv server (Layer 4)
+│   ├── app.py                   # FastAPI application (create_app)
+│   ├── environment.py           # CyberRange Environment subclass
+│   ├── models.py                # RangeAction, RangeObservation, RangeState
+│   ├── rewards.py               # Reward components (flag, stealth, detect, etc.)
+│   ├── Dockerfile               # Container for HF Spaces deployment
+│   └── requirements.txt
+├── client/                      # OpenEnv client (typed)
+│   ├── __init__.py
+│   └── client.py                # OpenRangeEnv(EnvClient) or MCPToolClient
+├── training/                    # Training scripts (DEFERRED — environment-first)
+│   ├── rollout.py               # rollout_func for GRPOTrainer (OpenEnv integration point)
+│   └── curriculum.py            # Phi: escalation logic, YAML mutation
+├── scripts/                     # Utility scripts
+│   ├── deploy_hf.sh             # Deploy to HF Spaces
+│   └── run_local.sh             # Local development runner
+├── tests/                       # Test suite
+│   ├── test_manifest.py         # Schema validation tests
+│   ├── test_validator.py        # Golden path validation tests
+│   ├── test_environment.py      # OpenEnv server tests
+│   ├── test_rewards.py          # Reward component tests
+│   └── test_integration.py      # End-to-end integration tests
+├── pyproject.toml
+└── README.md
+```
+## OpenEnv Compatibility (EXACT API Contract)
+OpenRange follows the OpenEnv 0.2.x environment pattern. Reference implementations:
+`envs/coding_env/` (command execution) and `envs/echo_env/` (MCP tools).
+### Base Classes (from `openenv.core.env_server.types`)
+```python
+# Action base: extra="forbid" (rejects unknown fields)
+class Action(BaseModel):
+    metadata: Dict[str, Any] = {}
+# Observation base: extra="forbid", already has done + reward
+class Observation(BaseModel):
+    done: bool = False
+    reward: bool | int | float | None = None
+    metadata: Dict[str, Any] = {}
+# State base: extra="allow" (allows additional fields)
+class State(BaseModel):
+    episode_id: Optional[str] = None
+    step_count: int = 0
+```
+### OpenRange Models (`server/models.py`)
+```python
+from openenv.core.env_server.types import Action, Observation, State
+class RangeAction(Action):
+    command: str                        # Shell command or tool invocation
+    mode: Literal["red", "blue"]        # Which operator is acting
+class RangeObservation(Observation):
+    # NOTE: done and reward are INHERITED from Observation base — do NOT redeclare
+    stdout: str = ""                    # Command output
+    stderr: str = ""                    # Error output
+    flags_captured: list[str] = []
+    alerts: list[str] = []             # Blue: IDS/log alerts
+class RangeState(State):
+    # NOTE: episode_id and step_count are INHERITED from State base
+    mode: str = ""                      # Current active mode (red/blue)
+    flags_found: list[str] = []
+    services_status: dict = {}
+    tier: int = 1
+```
+### Environment (`server/environment.py`)
+```python
+from openenv.core.env_server.interfaces import Environment
+class RangeEnvironment(Environment[RangeAction, RangeObservation, RangeState]):
+    SUPPORTS_CONCURRENT_SESSIONS = False  # One episode per range instance
+    def __init__(self):
+        super().__init__()  # Can pass transform= and rubric= here
+        self._state = RangeState()
+    def reset(self, seed: Optional[int] = None,
+              episode_id: Optional[str] = None, **kwargs) -> RangeObservation:
+        # Trigger Builder LLM mutation + Validator
+        # Clear episode state
+        self._state = RangeState(episode_id=episode_id or str(uuid4()))
+        return RangeObservation(stdout="Range ready. Begin reconnaissance.")
+    def step(self, action: RangeAction,
+             timeout_s: Optional[float] = None, **kwargs) -> RangeObservation:
+        # Route action.command to container via docker exec
+        # Compute reward via rubric
+        self._state.step_count += 1
+        obs = RangeObservation(stdout=result, stderr=err)
+        obs.reward = self._apply_rubric(action, obs)  # Uses Rubric if set
+        return obs
+    @property
+    def state(self) -> RangeState:
+        return self._state
+```
+### App Factory (`server/app.py`)
+```python
+from openenv.core.env_server import create_app
+from server.models import RangeAction, RangeObservation
+from server.environment import RangeEnvironment
+# MUST pass CLASS (not instance) — enables WebSocket session isolation
+app = create_app(RangeEnvironment, RangeAction, RangeObservation,
+                 env_name="open_range")
+```
+### Client (`client/client.py`)
+```python
+from openenv.core.env_client import EnvClient
+from openenv.core.client_types import StepResult
+class OpenRangeEnv(EnvClient[RangeAction, RangeObservation, RangeState]):
+    def _step_payload(self, action: RangeAction) -> dict:
+        return {"command": action.command, "mode": action.mode}
+    def _parse_result(self, payload: dict) -> StepResult[RangeObservation]:
+        obs = RangeObservation(**payload["observation"])
+        return StepResult(
+            observation=obs,
+            reward=payload.get("reward"),
+            done=bool(payload.get("done", False)),
+        )
+    def _parse_state(self, payload: dict) -> RangeState:
+        return RangeState(**payload)
+```
+### Endpoints (auto-provided by `create_app`)
+```
+GET  /health          → {"status": "healthy"}
+GET  /metadata        → environment name, version, description
+GET  /schema          → JSON schemas for action, observation, state
+POST /reset           → initial RangeObservation
+POST /step            → RangeObservation after executing action
+GET  /state           → current RangeState
+WS   /ws              → persistent WebSocket session
+```
+### MCP Alternative
+For richer tool discovery, OpenRange can also use `MCPEnvironment` with FastMCP tools:
+```python
+class RangeEnvironment(MCPEnvironment):
+    def __init__(self):
+        mcp = FastMCP("open_range")
+        @mcp.tool
+        def nmap_scan(target: str, flags: str = "-sV") -> str:
+            """Run nmap port scan against target"""
+            return docker_exec(f"nmap {flags} {target}")
+        @mcp.tool
+        def submit_flag(flag: str) -> str:
+            """Submit a captured flag for verification"""
+            return verify_flag(flag)
+        super().__init__(mcp)
+```
+This exposes tools via `ListToolsAction`/`CallToolAction` + MCP JSON-RPC at `/mcp`.
+## Reward Components
+### Red Operator
+| Signal | Weight | Source |
+|--------|--------|--------|
+| `r_flag` | Binary | `docker exec cat <path>` at golden-path location |
+| `r_efficiency` | `gamma^t` | Step count discount |
+| `r_stealth` | Negative | IDS log entry count |
+| `r_evidence` | 0-1 | Documentation quality of findings |
+| `r_halluc` | -0.3 | Per fake flag (compared against manifest) |
+| `r_social` | Binary | NPC fell for phish/pretext (Level 1+ only) |
+| `r_complexity` | tier_mult | `tier_multiplier * base_reward` (1.0x-3.0x, scales with snapshot complexity) |
+### Blue Operator
+| Signal | Weight | Source |
+|--------|--------|--------|
+| `r_detect` | 0-1 | TP rate (Red golden-path steps in logs) |
+| `r_patch` | Binary | Validator re-runs Red exploit -> fails = valid patch |
+| `r_avail` | 0-1 | Services still responding (healthchecks) |
+| `r_FP` | -0.2 | Per false alarm (NPC traffic flagged as attack) |
+| `r_phish_detect` | 0-1 | Correctly identified social engineering in logs (Level 1+ only) |
+| `r_complexity` | tier_mult | `tier_multiplier * base_reward` (1.0x-3.0x, scales with snapshot complexity) |
+## Tier System (Horizontal Growth)
+Each tier is a **fully integrated network** — services connect to each other, web apps talk to
+databases, auth systems protect resources, logs flow to monitoring. Not isolated containers.
+| Tier | Hosts | Networks | Integrated Services | Identity/Auth | Golden Steps |
+|------|-------|----------|---------------------|---------------|--------------|
+| 1 | attacker, firewall, web, mail, db, files, ldap, siem (8) | external, dmz, internal, mgmt | nginx+PHP web app → MySQL, postfix/dovecot, samba, OpenLDAP, rsyslog SIEM, iptables firewall | DB + LDAP user auth, session cookies | ~8 |
+| 2 | + jumpbox, vpn (10-12) | + guest, vpn | + SSH bastion, OpenVPN, cron jobs | + SSH key auth, VPN cert auth, email-based password reset | ~15 |
+| 3 | + CI/CD, dev-tools (14-18) | + partner, dev | + Jenkins/GitLab runner, dev endpoints | + AD/LDAP auth, Kerberos tickets, service accounts | ~25 |
+| 4 | + OT/SCADA, cloud-proxy (20-25) | + OT, cloud | + Modbus/OPC-UA simulators, cloud gateway | + jump host required for OT, credential rotation, MFA | ~35 |
+| 5 | + honeypots, WAF (30+) | + trap net | + decoy services, WAF, IDS, threat intel | + honeypot tokens, rate limiting, cert-based auth | ~50 |
+### How Services Integrate (Tier 1 — 8 Containers)
+```
+[attacker] (external zone)
+    |
+    | port 80, 443, 25 only via firewall
+    v
+[firewall] (perimeter) — iptables, NAT, zone enforcement, logs to siem
+    |
+    v
+[web.corp.local] (DMZ 10.0.1.0/24) nginx + PHP web app
+    |  - Login form -> authenticates against ldap (LDAP bind)
+    |  - Product search -> SQL query to db (vuln injection point)
+    |  - File upload -> stored on disk (vuln injection point)
+    |  - All access logged to /var/log/nginx/access.log -> siem
+    |
+    ├──> [mail.corp.local] (DMZ) postfix + dovecot
+    |       - User lookup against ldap
+    |       - NPC email traffic + social engineering surface
+    |       - Logs to siem
+    |
+    | port 3306 (internal only)
+    v
+[db.corp.local] (internal 10.0.2.0/24) MySQL
+    |  - users, products, flags tables
+    |  - Query logs -> siem
+    |
+[files.corp.local] (internal) samba
+    |  - SMB shares, access via ldap auth
+    |  - Logs to siem
+    |
+[ldap.corp.local] (mgmt 10.0.3.0/24) OpenLDAP + Kerberos
+    |  - Central auth for all services
+    |  - Audit replication to siem
+    |
+[siem.corp.local] (mgmt) rsyslog + log aggregation
+    - Blue's entry point — reads ALL logs here
+    - NPC traffic mixed with real attack traffic
+    - Blue reads logs, never touches web/db/files directly
+```
+## Agent Tool Philosophy: Container-as-Constraint
+**No artificial allowlists.** Agents can run ANY command available in their container.
+The Docker image defines what's possible — not code-level filtering.
+### How Commands Execute
+```
+Agent sends: RangeAction(command="nmap -sV 10.0.1.0/24", mode="red")
+    ↓
+environment.step() routes by mode:
+  Red → docker exec open-range-attacker-1 sh -c "nmap -sV 10.0.1.0/24"
+  Blue → docker exec open-range-siem-1 sh -c "..."
+    ↓
+Raw stdout/stderr returned as RangeObservation
+```
+No validation, sanitization, or allowlisting. The command string goes straight to `sh -c`.
+### What's Installed (Tier 1)
+**Red (Kali)**: nmap, sqlmap, hydra, nikto, smbclient, curl, wget, netcat, ssh,
+dnsutils, tcpdump, python3+pip. Plus all standard Kali/Debian tools. Agents can
+`pip install` or `apt install` additional tools at runtime.
+**Blue (SIEM)**: rsyslog, grep/awk/sed, jq, curl, ssh. All logs aggregated at
+`/var/log/siem/consolidated/all.log`. Agents can write custom scripts, parse JSON,
+correlate events — whatever Unix tools allow.
+### Meta-Commands (Handled by Environment, Not Containers)
+These are intercepted before docker exec:
+| Command | Role | Effect |
+|---------|------|--------|
+| `submit_flag <value>` | Red | Validates against snapshot flags; -0.3 penalty per hallucinated flag |
+| `submit_evidence <json>` | Red | Logs findings for evidence reward scoring |
+| `submit_finding <desc>` | Blue | Logs attack detection for accuracy scoring |
+| `auth <host> <user> <pass>` | Both | Validates creds against snapshot topology |
+| `logout <host>` | Both | Terminates active session |
+### What Agents Should NOT Be Told
+Agent prompts should NOT enumerate allowed tools. Instead:
+- Red: "You have a Kali workstation. Run any command."
+- Blue: "You have the SIEM console. Use any tool to investigate."
+The agent discovers what's available through reconnaissance (e.g., `which sqlmap`,
+`ls /usr/bin/`, `pip list`). This mirrors real pentesting and SOC work.
+### Docker Network Topology (Tier 1)
+```
+attacker (10.0.0.2) → firewall (10.0.0.3/10.0.1.2) → web (10.0.1.4)
+                         NAT + iptables                → mail (10.0.1.3)
+                                                       → db (10.0.2.x)
+                                                       → files (10.0.2.x)
+                                                       → ldap (10.0.3.x)
+                                                       → siem (10.0.3.x)
+```
+Attacker routes to DMZ/internal/mgmt via firewall. Only ports 80, 443, 25 pass
+from external→DMZ. The firewall enforces zone segmentation per manifest rules.
+## Builder LLM Schema Alignment (IMPORTANT)
+The Builder prompt schema and the Pydantic models MUST match field names.
+Mismatches cause `ValidationError` at parse time. Known mappings handled
+by `_parse_llm_response()` in `builder/builder.py`:
+| Prompt Schema | Pydantic Model | Parser Handles |
+|---------------|----------------|----------------|
+| `exploit_chain[].vuln` | `ExploitStep.vuln_id` | Yes |
+| `exploit_chain[].action` | `ExploitStep.command` | Yes |
+| `exploit_chain[].yields` | `ExploitStep.description` | Yes |
+| `golden_path[].cmd` | `GoldenPathStep.command` | Yes |
+| `golden_path[].expect_stdout` | `GoldenPathStep.expect_in_stdout` | Yes |
+| `accounts.smb_shares` (list) | `NPCPersona.accounts` (dict[str, Any]) | Yes |
+| `evidence_spec` (dict) | `list[EvidenceItem]` | Yes |
+**Rule**: When adding new fields to SnapshotSpec or its children, update BOTH
+the builder prompt schema AND the `_parse_llm_response()` mapper. If the LLM
+returns a different field name, add a fallback in the parser like
+`ec.get("vuln_id", ec.get("vuln", ""))`.
+## Azure OpenAI Configuration
+For LLM builder/validator, set these env vars:
+```bash
+export AZURE_API_KEY="..."
+export AZURE_API_BASE="https://<endpoint>.cognitiveservices.azure.com"
+export AZURE_API_VERSION="2025-04-01-preview"
+export OPENRANGE_BUILDER_MODEL="azure/gpt-5.2"  # or any azure/<deployment>
+```
+LiteLLM reads these automatically. Model format: `azure/<deployment_name>`.
+## Build & Development Commands
+```bash
+# Install dependencies
+uv sync --all-extras
+# Run tests (549 tests)
+uv run pytest tests/ -v --tb=short
+# Run OpenEnv server locally (mock mode, no Docker needed)
+uv run uvicorn open_range.server.app:app --host 0.0.0.0 --port 8000
+# Run demo episode (no Docker, no LLM)
+uv run python examples/demo.py
+# Build and start full Docker range stack (9 containers)
+docker compose build && docker compose up -d
+# Test LLM builder with Azure creds
+uv run python scripts/test_tier1_llm.py
+# Deploy to HF Spaces
+bash scripts/deploy_hf.sh
+```
+### Docker Gotchas (Apple Silicon / ARM64)
+- MySQL 5.7 has NO ARM64 images. Use `mysql:8.0` in docker-compose.yml.
+- PHP-FPM socket: Ubuntu 22.04 installs as `php8.1-fpm`, socket at
+  `/run/php/php8.1-fpm.sock` (not generic `/run/php/php-fpm.sock`).
+- Attacker container needs `cap_add: [NET_ADMIN]` + `iproute2` to add
+  routes to DMZ/internal/mgmt subnets via the firewall gateway.
+- Container names follow Docker Compose convention: `open-range-<service>-1`.
+  The environment resolves these via `_container_name()` discovery.
+## Key References
+- **OpenEnv**: `../References/OpenEnv/` (full reference repo)
+- **OpenEnv coding_env**: Pattern to follow for server/client structure
+- **OpenEnv RFC 001**: Agent vs Environment boundary (MCP + HTTP duality)
+- **OpenEnv RFC 004**: Rubric system for composable rewards
+- **R2E-Gym**: `../References/R2E-Gym/` (full codebase) + `../2504.07164v1.pdf` (paper). Procedural env generation via backtranslation, hybrid verifiers (execution + LLM), 8.1K executable tasks. Key lesson: hybrid verification breaks through single-method plateaus.
+- **Self-Play SWE-RL**: `../2512.18552v1.pdf`. Bug-injector + bug-solver self-play with shared weights. Key lessons: formal specs > NL, 7-check consistency validation, inverse mutation testing, frontier-calibrating Builder reward `r_inject = 1-(1+α)s`, higher-order challenges from failed attempts.
+- **CyBench** (ICLR'25): CTF benchmark (saturating, static)
+- **CVE-Bench** (ICML'25): Reward hacking lesson (agents gamed shortcuts)
+- **CybORG CAGE 4**: Red/Blue/Green agent model
+## Hackathon Scope & Priority
+### CORE (must ship — the OpenEnv environment)
+1. **Manifest schema** + example YAML manifests with golden paths
+2. **Builder LLM** — generates/mutates range infrastructure from manifest (structured JSON → templates → Docker)
+3. **Validator** — hybrid LLM review + 7-check scripted execution (including inverse mutation test)
+4. **OpenEnv server** — `RangeEnvironment(Environment)` with `reset()`, `step()`, `state`, deployed on HF Spaces
+5. **Rewards** — `Rubric` subclasses for Red and Blue, all verifiable against container state
+6. **Client** — `OpenRangeEnv(EnvClient)` with typed parsing
+7. **NPC traffic** — background noise for Blue
+### DEFERRED (training is downstream of the environment)
+Training scripts (GRPO, SFT, curriculum) are **out of scope for hackathon core**. The environment
+must work first — anyone can plug in TRL/Unsloth/SkyRL later via `rollout_func`. We demonstrate
+the environment with scripted or manual agents, not trained ones.
+### Constraints
+- **OpenEnv 0.2.x** on HF Spaces (FastAPI server with typed Pydantic models)
+- **Infra**: HF Spaces (OpenEnv server) + Docker host (range containers)
+- **Demo**: 1-min YouTube showing YAML → Builder generates range → Validator confirms → Red agent exploits → Blue agent defends → Builder mutates → new challenge
+- **License**: Apache 2.0

README.md CHANGED Viewed

@@ -1,7 +1,15 @@
 ---
-title: OpenRange
 sdk: docker
 app_port: 8000
 ---
 # OpenRange

 ---
+title: OpenRange Environment Server
+emoji: 🎯
+colorFrom: red
+colorTo: blue
 sdk: docker
+pinned: false
 app_port: 8000
+base_path: /web
+tags:
+  - openenv
+  - rl-environment
 ---
 # OpenRange

openenv.yaml CHANGED Viewed

@@ -4,5 +4,3 @@ type: space
 runtime: fastapi
 app: server.app:app
 port: 8000
-version: 0.1.0
-description: "Multi-agent cybersecurity gymnasium built on OpenEnv"

 runtime: fastapi
 app: server.app:app
 port: 8000

pyproject.toml CHANGED Viewed

@@ -1,17 +1,22 @@
 [project]
-name = "open-range"
 version = "0.1.0"
 description = "Multi-agent cybersecurity gymnasium built on OpenEnv"
 requires-python = ">=3.11"
 license = "Apache-2.0"
 dependencies = [
     "openenv-core[core]>=0.2.1",
-    "fastapi>=0.115",
-    "pydantic>=2.0",
     "pyyaml>=6.0",
     "docker>=7.0",
     "jinja2>=3.1",
-    "uvicorn>=0.27",
 ]
 [project.optional-dependencies]
@@ -19,15 +24,24 @@ dev = ["pytest>=8.0", "pytest-asyncio>=0.23", "httpx>=0.27"]
 training = ["trl>=0.8", "unsloth"]
 builder = ["litellm>=1.30"]
-[build-system]
-requires = ["hatchling"]
-build-backend = "hatchling.build"
-[tool.hatch.build.targets.wheel]
-packages = ["src/open_range"]
 [project.scripts]
 server = "open_range.server.app:main"
 [tool.pytest.ini_options]
 asyncio_mode = "auto"

+[build-system]
+requires = ["setuptools>=45", "wheel"]
+build-backend = "setuptools.build_meta"
 [project]
+name = "openenv-open-range"
 version = "0.1.0"
 description = "Multi-agent cybersecurity gymnasium built on OpenEnv"
 requires-python = ">=3.11"
 license = "Apache-2.0"
 dependencies = [
     "openenv-core[core]>=0.2.1",
+    "click>=8.1",
+    "fastapi>=0.115.0",
+    "pydantic>=2.0.0",
     "pyyaml>=6.0",
     "docker>=7.0",
     "jinja2>=3.1",
+    "uvicorn>=0.24.0",
 ]
 [project.optional-dependencies]
 training = ["trl>=0.8", "unsloth"]
 builder = ["litellm>=1.30"]
 [project.scripts]
+openrange = "open_range.cli:cli"
 server = "open_range.server.app:main"
+[tool.setuptools]
+include-package-data = true
+packages = [
+    "open_range",
+    "open_range.agents",
+    "open_range.builder",
+    "open_range.builder.npc",
+    "open_range.client",
+    "open_range.server",
+    "open_range.training",
+    "open_range.validator",
+]
+package-dir = { "" = "src" }
+package-data = { "open_range" = ["**/*.yaml", "**/*.yml"] }
 [tool.pytest.ini_options]
 asyncio_mode = "auto"

server/Dockerfile CHANGED Viewed

@@ -1,23 +1,42 @@
-FROM python:3.11-slim
 WORKDIR /app
-RUN apt-get update && \
-    apt-get install -y --no-install-recommends \
-        docker.io \
-        curl \
     && rm -rf /var/lib/apt/lists/*
-COPY pyproject.toml .
-COPY openenv.yaml .
-COPY server/ server/
-COPY src/ src/
-RUN pip install --no-cache-dir -e .
-EXPOSE 8000
-HEALTHCHECK --interval=30s --timeout=5s --retries=3 \
-    CMD curl -f http://localhost:8000/health || exit 1
-CMD ["uvicorn", "server.app:app", "--host", "0.0.0.0", "--port", "8000"]

+ARG BASE_IMAGE=ghcr.io/meta-pytorch/openenv-base:latest
+FROM ${BASE_IMAGE} AS builder
 WORKDIR /app
+COPY . /app/env
+WORKDIR /app/env
+# Install git for git+ dependencies
+RUN apt-get update && apt-get install -y --no-install-recommends git \
     && rm -rf /var/lib/apt/lists/*
+# Two-pass install for better layer caching
+RUN --mount=type=cache,target=/root/.cache/uv \
+    if [ -f uv.lock ]; then \
+        uv sync --frozen --no-install-project --no-editable; \
+    else \
+        uv sync --no-install-project --no-editable; \
+    fi
+RUN --mount=type=cache,target=/root/.cache/uv \
+    if [ -f uv.lock ]; then \
+        uv sync --frozen --no-editable; \
+    else \
+        uv sync --no-editable; \
+    fi
+# Runtime stage
+FROM ${BASE_IMAGE}
+WORKDIR /app
+COPY --from=builder /app/env/.venv /app/.venv
+COPY --from=builder /app/env /app/env
+ENV PATH="/app/.venv/bin:$PATH"
+ENV PYTHONPATH="/app/env/src:/app/env:$PYTHONPATH"
+HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
+    CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')" || exit 1
+CMD ["sh", "-c", "cd /app/env && uvicorn server.app:app --host 0.0.0.0 --port 8000"]

server/__init__.py CHANGED Viewed

@@ -1,6 +1,6 @@
 """Repository-level OpenEnv server entrypoints."""
-from .app import app, create_app
 from .environment import RangeEnvironment
-__all__ = ["RangeEnvironment", "app", "create_app"]

 """Repository-level OpenEnv server entrypoints."""
+from .app import app, main
 from .environment import RangeEnvironment
+__all__ = ["RangeEnvironment", "app", "main"]

server/app.py CHANGED Viewed

@@ -1,10 +1,16 @@
-"""OpenEnv app entrypoint expected by ``openenv.yaml``."""
 from __future__ import annotations
-from open_range.server.app import app, create_app
-__all__ = ["app", "create_app"]
 def main() -> None:

+"""OpenEnv app entrypoint expected by ``openenv.yaml``.
+Thin wrapper that delegates to the real app factory in
+``open_range.server.app``. This file lives at the repo root
+so the Dockerfile CMD ``cd /app/env && uvicorn server.app:app``
+resolves correctly inside HF Spaces.
+"""
 from __future__ import annotations
+from open_range.server.app import create_app as _create_app
+app = _create_app()
 def main() -> None:

src/open_range/builder/builder.py CHANGED Viewed

@@ -3,6 +3,9 @@
 - LLMSnapshotBuilder: production -- uses litellm to generate snapshot specs
 - TemplateOnlyBuilder: testing -- deterministic, no LLM calls
 - FileBuilder: demos -- loads a pre-built snapshot from a JSON file
 """
 from __future__ import annotations
@@ -12,7 +15,9 @@ import logging
 import os
 import random
 from pathlib import Path
-from typing import Any
 try:
     import litellm
@@ -38,6 +43,106 @@ from open_range.builder.prompts import BUILDER_SYSTEM_PROMPT
 logger = logging.getLogger(__name__)
 # ---------------------------------------------------------------------------
 # LLM-based builder (production)
 # ---------------------------------------------------------------------------
@@ -57,7 +162,18 @@ class LLMSnapshotBuilder:
         temperature: float = 0.7,
         max_retries: int = 3,
         max_tokens: int = 32768,
     ) -> None:
         self.model = model or os.environ.get(
             "OPENRANGE_BUILDER_MODEL", "anthropic/claude-sonnet-4-20250514"
         )
@@ -65,13 +181,18 @@ class LLMSnapshotBuilder:
         self.temperature = temperature
         self.max_retries = max_retries
         self.max_tokens = max_tokens
     async def build(
         self,
         manifest: dict,
         context: BuildContext,
     ) -> SnapshotSpec:
-        """Call LLM to generate a candidate snapshot spec."""
         if litellm is None:
             raise RuntimeError(
                 "LLMSnapshotBuilder requires the optional builder extra. "
@@ -89,23 +210,29 @@ class LLMSnapshotBuilder:
             )
         )
         last_error: Exception | None = None
         for attempt in range(1, self.max_retries + 1):
             try:
                 messages: list[dict[str, str]] = [
                     {"role": "system", "content": self.prompt_template},
                     {"role": "user", "content": user_payload},
                 ]
-                # If retrying after a validation error, append error context
-                error = getattr(context, "error", None)
-                if error and attempt > 1:
                     messages.append(
                         {
                             "role": "user",
                             "content": (
-                                "Previous attempt failed validation. "
-                                f"Error: {json.dumps(error)}\n"
-                                "Please fix and regenerate."
                             ),
                         }
                     )
@@ -114,6 +241,7 @@ class LLMSnapshotBuilder:
                     "model": self.model,
                     "messages": messages,
                     "max_tokens": self.max_tokens,
                 }
                 # Codex models don't support temperature
                 if self.temperature is not None:
@@ -121,24 +249,56 @@ class LLMSnapshotBuilder:
                 # Request JSON output; some models need the word "json"
                 # in messages to use json_object format
                 kwargs["response_format"] = {"type": "json_object"}
                 response = await litellm.acompletion(**kwargs)
                 raw = response.choices[0].message.content
                 spec = _parse_llm_response(raw)
                 logger.info(
-                    "LLMSnapshotBuilder: generated snapshot %s (attempt %d)",
-                    spec.topology.get("hosts", [])[:3],
                     attempt,
                 )
                 return spec
-            except Exception as exc:
                 last_error = exc
                 logger.warning(
                     "LLMSnapshotBuilder attempt %d/%d failed: %s",
                     attempt,
                     self.max_retries,
-                    exc,
                 )
         raise RuntimeError(
@@ -147,76 +307,182 @@ class LLMSnapshotBuilder:
         )
 def _parse_llm_response(raw_json: str) -> SnapshotSpec:
     """Parse raw JSON from LLM into a validated SnapshotSpec.
-    Handles the fact that the LLM output schema (from docs/builder-validator.md)
-    differs slightly from the SnapshotSpec Pydantic model in protocols.py.
     """
-    data = json.loads(raw_json)
     # Map truth_graph vulns
     vulns = []
-    for v in data.get("truth_graph", {}).get("vulns", []):
-        vulns.append(
-            Vulnerability(
-                id=v.get("id", ""),
-                type=v.get("type", ""),
-                host=v.get("host", ""),
-                service=v.get("service", ""),
-                injection_point=v.get("injection_point", ""),
-                vulnerable_code=v.get("vulnerable_code", ""),
-                root_cause=v.get("root_cause", ""),
-                blast_radius=v.get("blast_radius", ""),
-                remediation=v.get("remediation", ""),
             )
-        )
     # Map exploit_chain -- LLM uses "vuln"/"action", protocol uses "vuln_id"/"command"
     exploit_chain = []
-    for ec in data.get("truth_graph", {}).get("exploit_chain", []):
-        exploit_chain.append(
-            ExploitStep(
-                vuln_id=ec.get("vuln_id", ec.get("vuln", "")),
-                command=ec.get("command", ec.get("action", "")),
-                description=ec.get("description", ec.get("yields", "")),
             )
-        )
     truth_graph = TruthGraph(
         vulns=vulns,
         exploit_chain=exploit_chain,
     )
-    # Map golden_path -- LLM uses "expect_stdout", protocol uses "expect_in_stdout"
     golden_path = []
-    for step in data.get("golden_path", []):
         golden_path.append(
             GoldenPathStep(
-                step=step.get("step", 0),
-                command=step.get("cmd", step.get("command", "")),
-                expect_in_stdout=step.get(
-                    "expect_stdout", step.get("expect_in_stdout", "")
-                ),
-                description=step.get("description", ""),
             )
         )
     # Map flags
-    flags = [
-        FlagSpec(
-            id=f.get("id", ""),
-            value=f.get("value", ""),
-            path=f.get("path", ""),
-            host=f.get("host", ""),
-        )
-        for f in data.get("flags", [])
-    ]
-    # Map evidence_spec -- LLM returns dict, protocol expects list[EvidenceItem]
-    evidence_raw = data.get("evidence_spec", {})
     evidence_spec: list[EvidenceItem] = []
     if isinstance(evidence_raw, dict):
         for key, val in evidence_raw.items():
             if isinstance(val, list):
                 for item in val:
@@ -234,23 +500,31 @@ def _parse_llm_response(raw_json: str) -> SnapshotSpec:
     # Map NPC personas
     npc_personas = []
-    for p in data.get("npc_personas", []):
-        npc_personas.append(
-            NPCPersona(
-                name=p.get("name", ""),
-                role=p.get("role", ""),
-                department=p.get("department", ""),
-                reports_to=p.get("reports_to", ""),
-                communication_style=p.get("communication_style", ""),
-                security_awareness=p.get("security_awareness", 0.5),
-                susceptibility=p.get("susceptibility", {}),
-                routine=p.get("routine", {}),
-                accounts=p.get("accounts", {}),
             )
-        )
     # Map NPC traffic
-    npc_raw = data.get("npc_traffic", {})
     npc_traffic = NPCTrafficSpec(
         level=0,
         rate_lambda=npc_raw.get("http_rate", 10),
@@ -258,19 +532,17 @@ def _parse_llm_response(raw_json: str) -> SnapshotSpec:
     )
     # Map task
-    task_raw = data.get("task", {})
     task = TaskSpec(
-        red_briefing=task_raw.get("red_briefing", ""),
-        blue_briefing=task_raw.get("blue_briefing", ""),
     )
     # Map files -- explicit files from LLM + extract from vulnerable_code
     files: dict[str, str] = {}
     # 1. Explicit files field from LLM output
-    files_raw = data.get("files", {})
-    if isinstance(files_raw, dict):
-        for key, content in files_raw.items():
             if isinstance(content, str):
                 files[key] = content
@@ -289,8 +561,16 @@ def _parse_llm_response(raw_json: str) -> SnapshotSpec:
                 if container_key not in files:
                     files[container_key] = vc
     return SnapshotSpec(
-        topology=data.get("topology", {}),
         truth_graph=truth_graph,
         golden_path=golden_path,
         flags=flags,
@@ -629,6 +909,7 @@ class TemplateOnlyBuilder:
     """
     def __init__(self, vuln_pool: list[dict[str, Any]] | None = None) -> None:
         self.vuln_pool = vuln_pool or _DEFAULT_VULN_POOL
     async def build(
@@ -765,6 +1046,12 @@ class TemplateOnlyBuilder:
             scripts=["http_traffic.sh", "db_traffic.sh"],
         )
         return SnapshotSpec(
             topology=topology,
             truth_graph=truth_graph,
@@ -790,6 +1077,7 @@ class FileBuilder:
     """
     def __init__(self, snapshot_dir: str = "snapshots") -> None:
         self.snapshot_dir = Path(snapshot_dir)
     async def build(
@@ -797,7 +1085,7 @@ class FileBuilder:
         manifest: dict,
         context: BuildContext,
     ) -> SnapshotSpec:
-        """Load the snapshot JSON, optionally picking by seed."""
         if not self.snapshot_dir.exists():
             raise FileNotFoundError(
                 f"Snapshot directory not found: {self.snapshot_dir}"
@@ -817,5 +1105,6 @@ class FileBuilder:
         else:
             chosen = files[0]
         raw = json.loads(chosen.read_text())
         return _parse_llm_response(json.dumps(raw))

 - LLMSnapshotBuilder: production -- uses litellm to generate snapshot specs
 - TemplateOnlyBuilder: testing -- deterministic, no LLM calls
 - FileBuilder: demos -- loads a pre-built snapshot from a JSON file
+Each builder implements the SnapshotBuilder protocol and returns a validated
+SnapshotSpec that can be rendered into Docker artifacts by the SnapshotRenderer.
 """
 from __future__ import annotations
 import os
 import random
 from pathlib import Path
+from typing import Any, Optional
+from pydantic import BaseModel, Field
 try:
     import litellm
 logger = logging.getLogger(__name__)
+# ---------------------------------------------------------------------------
+# LLM raw output model -- matches the LLM's JSON schema exactly
+# ---------------------------------------------------------------------------
+class _LLMVulnerability(BaseModel):
+    """Raw vulnerability as returned by the LLM."""
+    id: str = ""
+    type: str = ""
+    host: str = ""
+    service: str = ""
+    injection_point: str = ""
+    vulnerable_code: str | dict[str, str] = ""
+    root_cause: str = ""
+    blast_radius: str = ""
+    remediation: str = ""
+class _LLMExploitStep(BaseModel):
+    """Raw exploit step -- LLM uses 'vuln'/'action'/'yields' field names."""
+    vuln: str = ""
+    vuln_id: str = ""
+    action: str = ""
+    command: str = ""
+    yields: str = ""
+    description: str = ""
+class _LLMGoldenPathStep(BaseModel):
+    """Raw golden path step -- LLM uses 'cmd' and 'expect_stdout'."""
+    step: int = 0
+    cmd: str = ""
+    command: str = ""
+    expect_stdout: str = ""
+    expect_in_stdout: str = ""
+    description: str = ""
+    host: str = "attacker"
+class _LLMFlag(BaseModel):
+    """Raw flag definition from LLM output."""
+    id: str = ""
+    value: str = ""
+    path: str = ""
+    host: str = ""
+class _LLMNPCPersona(BaseModel):
+    """Raw NPC persona from LLM output."""
+    name: str = ""
+    role: str = ""
+    department: str = ""
+    reports_to: str = ""
+    communication_style: str = ""
+    security_awareness: float = 0.5
+    susceptibility: dict[str, Any] = Field(default_factory=dict)
+    routine: dict[str, Any] = Field(default_factory=dict)
+    accounts: dict[str, Any] = Field(default_factory=dict)
+class _LLMTruthGraph(BaseModel):
+    """Raw truth graph from LLM output."""
+    vulns: list[_LLMVulnerability] = Field(default_factory=list)
+    exploit_chain: list[_LLMExploitStep] = Field(default_factory=list)
+class _LLMTask(BaseModel):
+    """Raw task specification from LLM output."""
+    red_briefing: str = ""
+    blue_briefing: str = ""
+class LLMSnapshotOutput(BaseModel):
+    """Intermediate model matching the LLM's raw JSON schema.
+    This captures the exact field names the LLM produces, including
+    known mismatches like 'vuln' vs 'vuln_id', 'cmd' vs 'command',
+    and 'expect_stdout' vs 'expect_in_stdout'. Parsing into this model
+    first makes schema mismatches explicit and testable before mapping
+    to the canonical SnapshotSpec.
+    """
+    topology: dict[str, Any] = Field(default_factory=dict)
+    truth_graph: _LLMTruthGraph = Field(default_factory=_LLMTruthGraph)
+    golden_path: list[_LLMGoldenPathStep] = Field(default_factory=list)
+    flags: list[_LLMFlag] = Field(default_factory=list)
+    evidence_spec: dict[str, Any] | list[dict[str, Any]] = Field(default_factory=dict)
+    npc_personas: list[_LLMNPCPersona] = Field(default_factory=list)
+    npc_traffic: dict[str, Any] = Field(default_factory=dict)
+    task: _LLMTask = Field(default_factory=_LLMTask)
+    files: dict[str, str] = Field(default_factory=dict)
 # ---------------------------------------------------------------------------
 # LLM-based builder (production)
 # ---------------------------------------------------------------------------
         temperature: float = 0.7,
         max_retries: int = 3,
         max_tokens: int = 32768,
+        timeout: float = 120.0,
     ) -> None:
+        """Initialize the LLM-based snapshot builder.
+        Args:
+            model: LiteLLM model identifier (e.g. 'azure/gpt-5.2').
+            prompt_template: System prompt override.
+            temperature: Sampling temperature for LLM calls.
+            max_retries: Maximum number of LLM call + parse attempts.
+            max_tokens: Maximum tokens in LLM response.
+            timeout: Timeout in seconds for each LLM call.
+        """
         self.model = model or os.environ.get(
             "OPENRANGE_BUILDER_MODEL", "anthropic/claude-sonnet-4-20250514"
         )
         self.temperature = temperature
         self.max_retries = max_retries
         self.max_tokens = max_tokens
+        self.timeout = timeout
     async def build(
         self,
         manifest: dict,
         context: BuildContext,
     ) -> SnapshotSpec:
+        """Call LLM to generate a candidate snapshot spec.
+        Retries on LLM or parse failures, appending error context to each
+        subsequent attempt so the LLM can self-correct.
+        """
         if litellm is None:
             raise RuntimeError(
                 "LLMSnapshotBuilder requires the optional builder extra. "
             )
         )
+        logger.info(
+            "LLMSnapshotBuilder: starting build (model=%s, tier=%d)",
+            self.model,
+            context.tier,
+        )
         last_error: Exception | None = None
+        last_error_msg: str = ""
         for attempt in range(1, self.max_retries + 1):
             try:
                 messages: list[dict[str, str]] = [
                     {"role": "system", "content": self.prompt_template},
                     {"role": "user", "content": user_payload},
                 ]
+                # If retrying after a failure, append error context so LLM can fix
+                if attempt > 1 and last_error_msg:
                     messages.append(
                         {
                             "role": "user",
                             "content": (
+                                "Previous attempt failed. "
+                                f"Error: {last_error_msg}\n"
+                                "Please fix and regenerate the complete JSON."
                             ),
                         }
                     )
                     "model": self.model,
                     "messages": messages,
                     "max_tokens": self.max_tokens,
+                    "timeout": self.timeout,
                 }
                 # Codex models don't support temperature
                 if self.temperature is not None:
                 # Request JSON output; some models need the word "json"
                 # in messages to use json_object format
                 kwargs["response_format"] = {"type": "json_object"}
+                logger.debug(
+                    "LLMSnapshotBuilder: sending request (attempt %d/%d, timeout=%.0fs)",
+                    attempt,
+                    self.max_retries,
+                    self.timeout,
+                )
                 response = await litellm.acompletion(**kwargs)
                 raw = response.choices[0].message.content
+                logger.debug(
+                    "LLMSnapshotBuilder: received response (%d chars)",
+                    len(raw) if raw else 0,
+                )
                 spec = _parse_llm_response(raw)
                 logger.info(
+                    "LLMSnapshotBuilder: build completed (attempt %d/%d, %d vulns, %d golden path steps)",
                     attempt,
+                    self.max_retries,
+                    len(spec.truth_graph.vulns),
+                    len(spec.golden_path),
                 )
                 return spec
+            except json.JSONDecodeError as exc:
+                last_error = exc
+                last_error_msg = f"JSON parse error at position {exc.pos}: {exc.msg}"
+                logger.warning(
+                    "LLMSnapshotBuilder attempt %d/%d: JSON parse failed: %s",
+                    attempt,
+                    self.max_retries,
+                    last_error_msg,
+                )
+            except SnapshotParseError as exc:
                 last_error = exc
+                last_error_msg = str(exc)
                 logger.warning(
+                    "LLMSnapshotBuilder attempt %d/%d: snapshot parse failed: %s",
+                    attempt,
+                    self.max_retries,
+                    last_error_msg,
+                )
+            except Exception as exc:
+                last_error = exc
+                last_error_msg = f"{type(exc).__name__}: {exc}"
+                logger.error(
                     "LLMSnapshotBuilder attempt %d/%d failed: %s",
                     attempt,
                     self.max_retries,
+                    last_error_msg,
                 )
         raise RuntimeError(
         )
+# ---------------------------------------------------------------------------
+# Parse error with context
+# ---------------------------------------------------------------------------
+class SnapshotParseError(Exception):
+    """Raised when LLM output cannot be parsed into a valid SnapshotSpec.
+    Includes the field that failed, received value, expected format,
+    and a truncated snippet of the raw JSON for debugging.
+    """
+    def __init__(
+        self,
+        message: str,
+        field: str = "",
+        received: Any = None,
+        expected: str = "",
+        raw_json_snippet: str = "",
+    ) -> None:
+        self.field = field
+        self.received = received
+        self.expected = expected
+        self.raw_json_snippet = raw_json_snippet
+        parts = [message]
+        if field:
+            parts.append(f"field={field!r}")
+        if received is not None:
+            recv_str = repr(received)
+            if len(recv_str) > 200:
+                recv_str = recv_str[:200] + "..."
+            parts.append(f"received={recv_str}")
+        if expected:
+            parts.append(f"expected={expected}")
+        if raw_json_snippet:
+            parts.append(f"raw_json_start={raw_json_snippet!r}")
+        super().__init__(" | ".join(parts))
+# ---------------------------------------------------------------------------
+# LLM response parser
+# ---------------------------------------------------------------------------
 def _parse_llm_response(raw_json: str) -> SnapshotSpec:
     """Parse raw JSON from LLM into a validated SnapshotSpec.
+    First parses into LLMSnapshotOutput (which matches the LLM's field names),
+    then maps to the canonical SnapshotSpec models. Handles known field-name
+    mismatches between the LLM prompt schema and Pydantic models.
     """
+    raw_snippet = raw_json[:500] if raw_json else ""
+    try:
+        data = json.loads(raw_json)
+    except json.JSONDecodeError:
+        raise
+    logger.debug("_parse_llm_response: parsing %d-char JSON response", len(raw_json))
+    # Parse into intermediate model first for early validation
+    try:
+        llm_output = LLMSnapshotOutput.model_validate(data)
+    except Exception as exc:
+        raise SnapshotParseError(
+            "Failed to parse LLM output into LLMSnapshotOutput",
+            field="root",
+            received=type(exc).__name__,
+            expected="valid LLMSnapshotOutput JSON",
+            raw_json_snippet=raw_snippet,
+        ) from exc
     # Map truth_graph vulns
     vulns = []
+    for i, v in enumerate(llm_output.truth_graph.vulns):
+        try:
+            vulns.append(
+                Vulnerability(
+                    id=v.id,
+                    type=v.type,
+                    host=v.host,
+                    service=v.service,
+                    injection_point=v.injection_point,
+                    vulnerable_code=v.vulnerable_code,
+                    root_cause=v.root_cause,
+                    blast_radius=v.blast_radius,
+                    remediation=v.remediation,
+                )
             )
+        except Exception as exc:
+            raise SnapshotParseError(
+                f"Failed to map vulnerability at index {i}",
+                field=f"truth_graph.vulns[{i}]",
+                received=v.model_dump(),
+                expected="valid Vulnerability fields",
+                raw_json_snippet=raw_snippet,
+            ) from exc
     # Map exploit_chain -- LLM uses "vuln"/"action", protocol uses "vuln_id"/"command"
     exploit_chain = []
+    for i, ec in enumerate(llm_output.truth_graph.exploit_chain):
+        vuln_id = ec.vuln_id or ec.vuln
+        command = ec.command or ec.action
+        description = ec.description or ec.yields
+        if vuln_id or command:
+            used_fallback = (not ec.vuln_id and ec.vuln) or (not ec.command and ec.action)
+            if used_fallback:
+                logger.warning(
+                    "exploit_chain[%d]: used fallback field names (vuln=%r -> vuln_id, action=%r -> command)",
+                    i,
+                    ec.vuln,
+                    ec.action,
+                )
+            exploit_chain.append(
+                ExploitStep(
+                    vuln_id=vuln_id,
+                    command=command,
+                    description=description,
+                )
             )
     truth_graph = TruthGraph(
         vulns=vulns,
         exploit_chain=exploit_chain,
     )
+    # Map golden_path -- LLM uses "cmd"/"expect_stdout", protocol uses "command"/"expect_in_stdout"
     golden_path = []
+    for i, step in enumerate(llm_output.golden_path):
+        command = step.command or step.cmd
+        expect = step.expect_in_stdout or step.expect_stdout
+        if not command and step.cmd:
+            logger.warning(
+                "golden_path[%d]: used 'cmd' fallback for 'command'",
+                i,
+            )
+        if not step.expect_in_stdout and step.expect_stdout:
+            logger.warning(
+                "golden_path[%d]: used 'expect_stdout' fallback for 'expect_in_stdout'",
+                i,
+            )
         golden_path.append(
             GoldenPathStep(
+                step=step.step,
+                command=command,
+                expect_in_stdout=expect,
+                description=step.description,
             )
         )
     # Map flags
+    flags = []
+    for i, f in enumerate(llm_output.flags):
+        try:
+            flags.append(
+                FlagSpec(
+                    id=f.id,
+                    value=f.value,
+                    path=f.path,
+                    host=f.host,
+                )
+            )
+        except Exception as exc:
+            raise SnapshotParseError(
+                f"Failed to map flag at index {i}",
+                field=f"flags[{i}]",
+                received=f.model_dump(),
+                expected="valid FlagSpec (id, value, path, host)",
+                raw_json_snippet=raw_snippet,
+            ) from exc
+    # Map evidence_spec -- LLM returns dict or list, protocol expects list[EvidenceItem]
     evidence_spec: list[EvidenceItem] = []
+    evidence_raw = llm_output.evidence_spec
     if isinstance(evidence_raw, dict):
+        logger.debug("evidence_spec: converting dict format to list[EvidenceItem]")
         for key, val in evidence_raw.items():
             if isinstance(val, list):
                 for item in val:
     # Map NPC personas
     npc_personas = []
+    for i, p in enumerate(llm_output.npc_personas):
+        try:
+            npc_personas.append(
+                NPCPersona(
+                    name=p.name,
+                    role=p.role,
+                    department=p.department,
+                    reports_to=p.reports_to,
+                    communication_style=p.communication_style,
+                    security_awareness=p.security_awareness,
+                    susceptibility=p.susceptibility,
+                    routine=p.routine,
+                    accounts=p.accounts,
+                )
+            )
+        except Exception as exc:
+            logger.warning(
+                "npc_personas[%d]: failed to map persona %r: %s",
+                i,
+                p.name,
+                exc,
             )
     # Map NPC traffic
+    npc_raw = llm_output.npc_traffic
     npc_traffic = NPCTrafficSpec(
         level=0,
         rate_lambda=npc_raw.get("http_rate", 10),
     )
     # Map task
     task = TaskSpec(
+        red_briefing=llm_output.task.red_briefing,
+        blue_briefing=llm_output.task.blue_briefing,
     )
     # Map files -- explicit files from LLM + extract from vulnerable_code
     files: dict[str, str] = {}
     # 1. Explicit files field from LLM output
+    if isinstance(llm_output.files, dict):
+        for key, content in llm_output.files.items():
             if isinstance(content, str):
                 files[key] = content
                 if container_key not in files:
                     files[container_key] = vc
+    logger.debug(
+        "_parse_llm_response: mapped %d vulns, %d golden path steps, %d flags, %d files",
+        len(vulns),
+        len(golden_path),
+        len(flags),
+        len(files),
+    )
     return SnapshotSpec(
+        topology=llm_output.topology,
         truth_graph=truth_graph,
         golden_path=golden_path,
         flags=flags,
     """
     def __init__(self, vuln_pool: list[dict[str, Any]] | None = None) -> None:
+        """Initialize with an optional custom vulnerability pool."""
         self.vuln_pool = vuln_pool or _DEFAULT_VULN_POOL
     async def build(
             scripts=["http_traffic.sh", "db_traffic.sh"],
         )
+        logger.info(
+            "TemplateOnlyBuilder: built snapshot with %d vulns (seed=%s)",
+            len(vulns),
+            context.seed,
+        )
         return SnapshotSpec(
             topology=topology,
             truth_graph=truth_graph,
     """
     def __init__(self, snapshot_dir: str = "snapshots") -> None:
+        """Initialize with the directory containing snapshot JSON files."""
         self.snapshot_dir = Path(snapshot_dir)
     async def build(
         manifest: dict,
         context: BuildContext,
     ) -> SnapshotSpec:
+        """Load a snapshot JSON file, optionally picking by seed."""
         if not self.snapshot_dir.exists():
             raise FileNotFoundError(
                 f"Snapshot directory not found: {self.snapshot_dir}"
         else:
             chosen = files[0]
+        logger.info("FileBuilder: loading snapshot from %s", chosen)
         raw = json.loads(chosen.read_text())
         return _parse_llm_response(json.dumps(raw))

src/open_range/cli.py ADDED Viewed

	@@ -0,0 +1,438 @@

+"""OpenRange CLI -- production command-line interface for the cybersecurity gymnasium.
+Usage::
+    openrange build -m manifests/tier1_basic.yaml
+    openrange render -s snapshots/spec.json -o output/
+    openrange validate -s snapshots/spec.json
+    openrange deploy -s snapshots/spec.json
+    openrange server --port 8000
+"""
+from __future__ import annotations
+import asyncio
+import json
+import logging
+import os
+import sys
+import time
+from pathlib import Path
+from typing import Any
+import click
+import yaml
+# ---------------------------------------------------------------------------
+# Logging setup
+# ---------------------------------------------------------------------------
+LOG_FORMAT = "%(asctime)s [%(levelname)s] %(name)s: %(message)s"
+LOG_DATE_FORMAT = "%H:%M:%S"
+def _configure_logging(verbose: bool) -> None:
+    level = logging.DEBUG if verbose else logging.INFO
+    logging.basicConfig(
+        level=level,
+        format=LOG_FORMAT,
+        datefmt=LOG_DATE_FORMAT,
+        stream=sys.stderr,
+    )
+    # Quiet noisy third-party loggers unless in verbose mode
+    if not verbose:
+        for name in ("httpx", "httpcore", "litellm", "urllib3", "docker"):
+            logging.getLogger(name).setLevel(logging.WARNING)
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+def _run_async(coro: Any) -> Any:
+    """Run an async coroutine from synchronous Click context."""
+    try:
+        loop = asyncio.get_running_loop()
+    except RuntimeError:
+        loop = None
+    if loop and loop.is_running():
+        # Shouldn't happen in a CLI, but be safe.
+        import concurrent.futures
+        with concurrent.futures.ThreadPoolExecutor() as pool:
+            return pool.submit(asyncio.run, coro).result()
+    return asyncio.run(coro)
+def _load_manifest(path: str) -> dict[str, Any]:
+    """Load and return a YAML manifest as a dict."""
+    p = Path(path)
+    if not p.exists():
+        click.echo(f"Error: manifest not found: {p}", err=True)
+        sys.exit(1)
+    with open(p) as f:
+        data = yaml.safe_load(f)
+    if not isinstance(data, dict):
+        click.echo(f"Error: manifest must be a YAML mapping, got {type(data).__name__}", err=True)
+        sys.exit(1)
+    return data
+def _load_snapshot(path: str) -> "SnapshotSpec":
+    """Load a snapshot JSON file into a SnapshotSpec."""
+    from open_range.protocols import SnapshotSpec
+    p = Path(path)
+    if not p.exists():
+        click.echo(f"Error: snapshot not found: {p}", err=True)
+        sys.exit(1)
+    with open(p) as f:
+        data = json.load(f)
+    try:
+        return SnapshotSpec.model_validate(data)
+    except Exception as exc:
+        click.echo(f"Error: invalid snapshot JSON: {exc}", err=True)
+        sys.exit(1)
+def _write_snapshot(spec: "SnapshotSpec", output_dir: Path) -> Path:
+    """Write a SnapshotSpec to spec.json inside output_dir. Returns the file path."""
+    output_dir.mkdir(parents=True, exist_ok=True)
+    dest = output_dir / "spec.json"
+    dest.write_text(json.dumps(spec.model_dump(), indent=2, default=str))
+    return dest
+# ---------------------------------------------------------------------------
+# CLI group
+# ---------------------------------------------------------------------------
+@click.group()
+@click.option("-v", "--verbose", is_flag=True, default=False, help="Enable debug logging.")
+@click.version_option(package_name="openenv-open-range", prog_name="openrange")
+def cli(verbose: bool) -> None:
+    """OpenRange -- multi-agent cybersecurity gymnasium.
+    Generate, validate, deploy, and serve Docker-based cyber ranges
+    for adversarial Red/Blue agent training.
+    """
+    _configure_logging(verbose)
+# ---------------------------------------------------------------------------
+# build
+# ---------------------------------------------------------------------------
+@cli.command()
+@click.option("-m", "--manifest", required=True, type=click.Path(exists=True), help="Path to manifest YAML.")
+@click.option("-o", "--output", default="./snapshots", type=click.Path(), help="Output directory for snapshot.")
+@click.option("--model", default=None, help="LLM model (default: $OPENRANGE_BUILDER_MODEL or azure/gpt-5.2).")
+@click.option("--tier", default=1, type=click.IntRange(1, 5), help="Tier level 1-5.")
+@click.option("--seed", default=None, type=int, help="Random seed for reproducibility.")
+@click.option("--template-only", is_flag=True, default=False, help="Skip LLM, use deterministic template builder.")
+@click.option("--max-tokens", default=16384, type=int, help="Max tokens for LLM generation.")
+def build(
+    manifest: str,
+    output: str,
+    model: str | None,
+    tier: int,
+    seed: int | None,
+    template_only: bool,
+    max_tokens: int,
+) -> None:
+    """Generate a snapshot from a manifest YAML.
+    Uses the LLM builder by default. Pass --template-only for a deterministic
+    snapshot without any LLM calls (useful for testing).
+    """
+    from open_range.builder.builder import LLMSnapshotBuilder, TemplateOnlyBuilder
+    from open_range.protocols import BuildContext
+    manifest_data = _load_manifest(manifest)
+    context = BuildContext(seed=seed, tier=tier)
+    if template_only:
+        builder = TemplateOnlyBuilder()
+        click.echo(f"Building snapshot (template-only, tier {tier}) ...")
+    else:
+        resolved_model = model or os.environ.get("OPENRANGE_BUILDER_MODEL", "azure/gpt-5.2")
+        builder = LLMSnapshotBuilder(model=resolved_model, max_tokens=max_tokens)
+        click.echo(f"Building snapshot (model={resolved_model}, tier {tier}) ...")
+    t0 = time.monotonic()
+    try:
+        spec = _run_async(builder.build(manifest_data, context))
+    except Exception as exc:
+        click.echo(f"Error: build failed: {exc}", err=True)
+        sys.exit(1)
+    elapsed = time.monotonic() - t0
+    output_path = Path(output)
+    dest = _write_snapshot(spec, output_path)
+    n_vulns = len(spec.truth_graph.vulns)
+    n_steps = len(spec.golden_path)
+    n_flags = len(spec.flags)
+    click.echo(f"Snapshot written to {dest}")
+    click.echo(f"  Vulnerabilities: {n_vulns}")
+    click.echo(f"  Golden path steps: {n_steps}")
+    click.echo(f"  Flags: {n_flags}")
+    click.echo(f"  Elapsed: {elapsed:.1f}s")
+# ---------------------------------------------------------------------------
+# render
+# ---------------------------------------------------------------------------
+@cli.command()
+@click.option("-s", "--snapshot", required=True, type=click.Path(exists=True), help="Path to snapshot JSON.")
+@click.option("-o", "--output", required=True, type=click.Path(), help="Output directory for Docker artifacts.")
+def render(snapshot: str, output: str) -> None:
+    """Render a snapshot JSON into Docker artifacts (Dockerfiles, compose, configs)."""
+    from open_range.builder.renderer import SnapshotRenderer
+    spec = _load_snapshot(snapshot)
+    renderer = SnapshotRenderer()
+    output_path = Path(output)
+    click.echo(f"Rendering snapshot to {output_path} ...")
+    try:
+        renderer.render(spec, output_path)
+    except Exception as exc:
+        click.echo(f"Error: render failed: {exc}", err=True)
+        sys.exit(1)
+    # List produced files
+    if output_path.exists():
+        artifacts = sorted(p.name for p in output_path.iterdir() if p.is_file())
+        click.echo(f"Produced {len(artifacts)} artifacts:")
+        for name in artifacts:
+            click.echo(f"  {name}")
+# ---------------------------------------------------------------------------
+# validate
+# ---------------------------------------------------------------------------
+# Canonical name -> check class. The order matches the 10-check pipeline.
+_CHECK_REGISTRY: dict[str, str] = {
+    "build_boot": "open_range.validator.build_boot.BuildBootCheck",
+    "exploitability": "open_range.validator.exploitability.ExploitabilityCheck",
+    "patchability": "open_range.validator.patchability.PatchabilityCheck",
+    "evidence": "open_range.validator.evidence.EvidenceCheck",
+    "reward_grounding": "open_range.validator.reward_grounding.RewardGroundingCheck",
+    "isolation": "open_range.validator.isolation.IsolationCheck",
+    "task_feasibility": "open_range.validator.task_feasibility.TaskFeasibilityCheck",
+    "difficulty": "open_range.validator.difficulty.DifficultyCheck",
+    "npc_consistency": "open_range.validator.npc_consistency.NPCConsistencyCheck",
+    "realism_review": "open_range.validator.realism_review.RealismReviewCheck",
+}
+# Checks that require running Docker containers.
+_DOCKER_CHECKS = {"build_boot", "exploitability", "patchability", "evidence"}
+def _import_check(dotted: str) -> Any:
+    """Import a check class by dotted path."""
+    module_path, class_name = dotted.rsplit(".", 1)
+    import importlib
+    mod = importlib.import_module(module_path)
+    return getattr(mod, class_name)
+@cli.command()
+@click.option("-s", "--snapshot", required=True, type=click.Path(exists=True), help="Path to snapshot JSON.")
+@click.option("--checks", default=None, help="Comma-separated check names (default: all applicable).")
+@click.option("--docker/--no-docker", default=False, help="Include Docker-dependent checks (requires running containers).")
+def validate(snapshot: str, checks: str | None, docker: bool) -> None:
+    """Run validator checks against a snapshot.
+    By default runs only offline checks (no Docker required). Use --docker
+    to include checks that need live containers.
+    Available checks: build_boot, exploitability, patchability, evidence,
+    reward_grounding, isolation, task_feasibility, difficulty,
+    npc_consistency, realism_review.
+    """
+    from open_range.protocols import ContainerSet
+    from open_range.validator.validator import ValidatorGate
+    spec = _load_snapshot(snapshot)
+    # Determine which checks to run
+    if checks:
+        names = [n.strip() for n in checks.split(",")]
+        unknown = [n for n in names if n not in _CHECK_REGISTRY]
+        if unknown:
+            click.echo(f"Error: unknown checks: {', '.join(unknown)}", err=True)
+            click.echo(f"Available: {', '.join(_CHECK_REGISTRY)}", err=True)
+            sys.exit(1)
+    else:
+        if docker:
+            names = list(_CHECK_REGISTRY)
+        else:
+            names = [n for n in _CHECK_REGISTRY if n not in _DOCKER_CHECKS]
+    if not names:
+        click.echo("No checks selected.")
+        sys.exit(0)
+    # Instantiate checks
+    check_instances = []
+    for name in names:
+        cls = _import_check(_CHECK_REGISTRY[name])
+        check_instances.append(cls())
+    # Containers stub for offline mode, real discovery for docker mode
+    containers = ContainerSet()
+    gate = ValidatorGate(check_instances)
+    click.echo(f"Running {len(check_instances)} checks ...")
+    result = _run_async(gate.validate(spec, containers))
+    # Print results
+    for cr in result.checks:
+        status = "PASS" if cr.passed else ("ADVISORY" if cr.advisory else "FAIL")
+        line = f"  [{status}] {cr.name}"
+        if cr.time_s > 0:
+            line += f" ({cr.time_s:.2f}s)"
+        click.echo(line)
+        if cr.error:
+            click.echo(f"         {cr.error}")
+    click.echo("")
+    if result.passed:
+        click.echo(f"Validation PASSED ({result.total_time_s:.2f}s)")
+    else:
+        click.echo(f"Validation FAILED ({result.total_time_s:.2f}s)")
+        sys.exit(1)
+# ---------------------------------------------------------------------------
+# deploy
+# ---------------------------------------------------------------------------
+@cli.command()
+@click.option("-s", "--snapshot", required=True, type=click.Path(exists=True), help="Path to snapshot JSON.")
+@click.option("--compose-dir", default=None, type=click.Path(), help="Directory containing docker-compose.yml (default: render into temp dir).")
+def deploy(snapshot: str, compose_dir: str | None) -> None:
+    """Deploy a snapshot to running Docker containers.
+    Renders the snapshot into Docker artifacts and runs docker compose up.
+    If --compose-dir is given, uses that directory; otherwise renders into
+    a temporary directory alongside the snapshot.
+    """
+    import subprocess
+    from open_range.builder.renderer import SnapshotRenderer
+    spec = _load_snapshot(snapshot)
+    if compose_dir:
+        target = Path(compose_dir)
+    else:
+        target = Path(snapshot).parent / "deploy"
+    # Render artifacts
+    renderer = SnapshotRenderer()
+    click.echo(f"Rendering Docker artifacts to {target} ...")
+    try:
+        renderer.render(spec, target)
+    except Exception as exc:
+        click.echo(f"Error: render failed: {exc}", err=True)
+        sys.exit(1)
+    compose_file = target / "docker-compose.yml"
+    if not compose_file.exists():
+        click.echo(f"Error: no docker-compose.yml found in {target}", err=True)
+        sys.exit(1)
+    click.echo("Starting containers with docker compose ...")
+    try:
+        proc = subprocess.run(
+            ["docker", "compose", "-f", str(compose_file), "up", "-d", "--build"],
+            cwd=str(target),
+            capture_output=True,
+            text=True,
+            timeout=300,
+        )
+    except FileNotFoundError:
+        click.echo("Error: docker command not found. Is Docker installed and in PATH?", err=True)
+        sys.exit(1)
+    except subprocess.TimeoutExpired:
+        click.echo("Error: docker compose up timed out after 300s.", err=True)
+        sys.exit(1)
+    if proc.returncode != 0:
+        click.echo(f"Error: docker compose up failed (exit {proc.returncode}):", err=True)
+        if proc.stderr:
+            click.echo(proc.stderr, err=True)
+        sys.exit(1)
+    click.echo("Containers started.")
+    # Show running container status
+    try:
+        ps = subprocess.run(
+            ["docker", "compose", "-f", str(compose_file), "ps", "--format", "table"],
+            cwd=str(target),
+            capture_output=True,
+            text=True,
+            timeout=30,
+        )
+        if ps.stdout:
+            click.echo(ps.stdout)
+    except Exception:
+        pass  # Non-critical
+# ---------------------------------------------------------------------------
+# server
+# ---------------------------------------------------------------------------
+@cli.command()
+@click.option("--host", default="0.0.0.0", help="Host to bind.")
+@click.option("--port", default=8000, type=int, help="Port to listen on.")
+@click.option("--mock/--no-mock", default=False, help="Use mock mode (no Docker required).")
+def server(host: str, port: int, mock: bool) -> None:
+    """Start the OpenEnv server.
+    In mock mode, the environment simulates container interactions without
+    requiring a running Docker stack.
+    """
+    import uvicorn
+    if mock:
+        os.environ["OPENRANGE_MOCK"] = "1"
+        click.echo(f"Starting OpenRange server in MOCK mode on {host}:{port} ...")
+    else:
+        click.echo(f"Starting OpenRange server on {host}:{port} ...")
+    try:
+        uvicorn.run(
+            "open_range.server.app:app",
+            host=host,
+            port=port,
+            log_level="info",
+        )
+    except Exception as exc:
+        click.echo(f"Error: server failed: {exc}", err=True)
+        sys.exit(1)
+# ---------------------------------------------------------------------------
+# Entry point
+# ---------------------------------------------------------------------------
+if __name__ == "__main__":
+    cli()

src/open_range/client/client.py CHANGED Viewed

@@ -1,9 +1,36 @@
-"""Typed OpenEnv client for OpenRange."""
 from __future__ import annotations
-from openenv.core.client_types import StepResult
-from openenv.core.env_client import EnvClient
 from open_range.server.models import RangeAction, RangeObservation, RangeState

+"""Typed OpenEnv client for OpenRange.
+Falls back to lightweight stubs if openenv is not installed.
+"""
 from __future__ import annotations
+from typing import Any, Generic, TypeVar
+try:
+    from openenv.core.client_types import StepResult
+    from openenv.core.env_client import EnvClient
+except ImportError:
+    from dataclasses import dataclass, field
+    _A = TypeVar("_A")
+    _O = TypeVar("_O")
+    _S = TypeVar("_S")
+    @dataclass
+    class StepResult(Generic[_O]):  # type: ignore[no-redef]
+        """Minimal stub matching openenv.core.client_types.StepResult."""
+        observation: Any = None
+        reward: float | int | None = None
+        done: bool = False
+        metadata: dict[str, Any] = field(default_factory=dict)
+    class EnvClient(Generic[_A, _O, _S]):  # type: ignore[no-redef]
+        """Minimal stub matching openenv.core.env_client.EnvClient."""
+        def __init__(self, *args: Any, **kwargs: Any) -> None:
+            pass
 from open_range.server.models import RangeAction, RangeObservation, RangeState

src/open_range/server/Dockerfile DELETED Viewed

@@ -1,44 +0,0 @@
-FROM python:3.11-slim AS builder
-WORKDIR /app
-# Install uv for fast dependency resolution
-RUN pip install --no-cache-dir uv
-# Copy project files
-COPY pyproject.toml uv.lock* ./
-COPY src/ src/
-COPY openenv.yaml .
-COPY manifests/ manifests/
-# Install dependencies
-RUN uv sync --frozen --no-editable 2>/dev/null || uv sync --no-editable
-# --- Runtime stage ---
-FROM python:3.11-slim
-WORKDIR /app
-# Runtime system deps: Docker CLI (for controlling range containers) + curl
-RUN apt-get update && \
-    apt-get install -y --no-install-recommends \
-        docker.io \
-        curl \
-    && rm -rf /var/lib/apt/lists/*
-COPY --from=builder /app/.venv /app/.venv
-COPY --from=builder /app/src /app/src
-COPY --from=builder /app/pyproject.toml /app/pyproject.toml
-COPY --from=builder /app/openenv.yaml /app/openenv.yaml
-COPY --from=builder /app/manifests /app/manifests
-COPY server/ server/
-ENV PATH="/app/.venv/bin:$PATH"
-ENV PYTHONPATH="/app/src:$PYTHONPATH"
-EXPOSE 8000
-HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
-    CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')" || exit 1
-CMD ["uvicorn", "server.app:app", "--host", "0.0.0.0", "--port", "8000"]

src/open_range/server/app.py CHANGED Viewed

	@@ -39,3 +39,6 @@ def main() -> None:
39
40
41	app = create_app()

 app = create_app()
+if __name__ == "__main__":
+    main()

src/open_range/server/environment.py CHANGED Viewed

@@ -16,6 +16,7 @@ Design:
 from __future__ import annotations
 import logging
 import time
 from typing import TYPE_CHECKING, Any
 from uuid import uuid4
@@ -248,11 +249,11 @@ class RangeEnvironment(_BASE):  # type: ignore[misc]
                 parent_dir = path.rsplit("/", 1)[0] if "/" in path else "/"
                 self._exec_in_container(
-                    container_name, f"mkdir -p '{parent_dir}'"
                 )
                 b64 = base64.b64encode(content.encode()).decode()
-                cmd = f"echo '{b64}' | base64 -d > '{path}'"
                 _, stderr = self._exec_in_container(container_name, cmd)
                 if stderr and "Error" in stderr:
                     logger.warning(
@@ -284,32 +285,41 @@ class RangeEnvironment(_BASE):  # type: ignore[misc]
         """
         if "snapshot" in kwargs and isinstance(kwargs["snapshot"], SnapshotSpec):
             self._snapshot_id = kwargs.get("snapshot_id")
-            return kwargs["snapshot"]
-        if self._runtime is not None:
             if "snapshot_id" in kwargs and kwargs["snapshot_id"]:
                 admitted = self._runtime.get_snapshot(str(kwargs["snapshot_id"]))
             else:
                 admitted = self._runtime.acquire_snapshot()
             self._snapshot_id = admitted.snapshot_id
-            return admitted.snapshot
-        self._snapshot_id = None
-        return SnapshotSpec(
-            topology={"hosts": []},
-            flags=[],
-            golden_path=[],
-            task={
-                "red_briefing": (
-                    "Target network detected. Begin reconnaissance and "
-                    "identify vulnerabilities. Capture all flags."
-                ),
-                "blue_briefing": (
-                    "Monitor SIEM for suspicious activity. Investigate "
-                    "alerts, patch vulnerabilities, and report findings."
-                ),
-            },
-        )
     # -----------------------------------------------------------------
     # Special command handling
@@ -328,13 +338,13 @@ class RangeEnvironment(_BASE):  # type: ignore[misc]
                 done=True,
             )
-        valid_flags = {f.value for f in self._snapshot.flags}
         already_found = set(self._state.flags_found)
         if submitted in valid_flags and submitted not in already_found:
             self._state.flags_found.append(submitted)
             # Check if all flags captured
-            all_captured = set(self._state.flags_found) == valid_flags
             return RangeObservation(
                 stdout=f"Correct! Flag accepted: {submitted}",
                 flags_captured=[submitted],
@@ -395,7 +405,7 @@ class RangeEnvironment(_BASE):  # type: ignore[misc]
         Checks credentials against the topology user list in the snapshot.
         Successful auth is recorded in ``state.active_sessions``.
         """
-        parts = action.command.strip().split()
         if len(parts) < 4:
             return RangeObservation(
                 stdout="",
@@ -615,8 +625,8 @@ class RangeEnvironment(_BASE):  # type: ignore[misc]
             "Episode %s reset: tier=%d, flags=%d, golden_path_steps=%d",
             eid,
             self._state.tier,
-            len(self._snapshot.flags),
-            len(self._snapshot.golden_path),
         )
         return RangeObservation(stdout=briefing)
@@ -774,7 +784,7 @@ class RangeEnvironment(_BASE):  # type: ignore[misc]
                     action, obs, self._state, self._snapshot, reward_ctx
                 )
         except Exception as exc:
-            logger.warning("Reward computation failed: %s", exc)
             obs.reward = 0.0
         return obs

 from __future__ import annotations
 import logging
+import shlex
 import time
 from typing import TYPE_CHECKING, Any
 from uuid import uuid4
                 parent_dir = path.rsplit("/", 1)[0] if "/" in path else "/"
                 self._exec_in_container(
+                    container_name, f"mkdir -p {shlex.quote(parent_dir)}"
                 )
                 b64 = base64.b64encode(content.encode()).decode()
+                cmd = f"echo '{b64}' | base64 -d > {shlex.quote(path)}"
                 _, stderr = self._exec_in_container(container_name, cmd)
                 if stderr and "Error" in stderr:
                     logger.warning(
         """
         if "snapshot" in kwargs and isinstance(kwargs["snapshot"], SnapshotSpec):
             self._snapshot_id = kwargs.get("snapshot_id")
+            snap = kwargs["snapshot"]
+        elif self._runtime is not None:
             if "snapshot_id" in kwargs and kwargs["snapshot_id"]:
                 admitted = self._runtime.get_snapshot(str(kwargs["snapshot_id"]))
             else:
                 admitted = self._runtime.acquire_snapshot()
             self._snapshot_id = admitted.snapshot_id
+            snap = admitted.snapshot
+        else:
+            self._snapshot_id = None
+            snap = SnapshotSpec(
+                topology={"hosts": []},
+                flags=[],
+                golden_path=[],
+                task={
+                    "red_briefing": (
+                        "Target network detected. Begin reconnaissance and "
+                        "identify vulnerabilities. Capture all flags."
+                    ),
+                    "blue_briefing": (
+                        "Monitor SIEM for suspicious activity. Investigate "
+                        "alerts, patch vulnerabilities, and report findings."
+                    ),
+                },
+            )
+        # Defensive: ensure required fields are not None
+        if snap.flags is None:
+            snap.flags = []
+        if snap.topology is None:
+            snap.topology = {}
+        if snap.task is None:
+            snap.task = {}
+        return snap
     # -----------------------------------------------------------------
     # Special command handling
                 done=True,
             )
+        valid_flags = {f.value for f in self._snapshot.flags} if self._snapshot.flags else set()
         already_found = set(self._state.flags_found)
         if submitted in valid_flags and submitted not in already_found:
             self._state.flags_found.append(submitted)
             # Check if all flags captured
+            all_captured = valid_flags and set(self._state.flags_found) == valid_flags
             return RangeObservation(
                 stdout=f"Correct! Flag accepted: {submitted}",
                 flags_captured=[submitted],
         Checks credentials against the topology user list in the snapshot.
         Successful auth is recorded in ``state.active_sessions``.
         """
+        parts = action.command.strip().split(maxsplit=3)
         if len(parts) < 4:
             return RangeObservation(
                 stdout="",
             "Episode %s reset: tier=%d, flags=%d, golden_path_steps=%d",
             eid,
             self._state.tier,
+            len(self._snapshot.flags or []),
+            len(self._snapshot.golden_path or []),
         )
         return RangeObservation(stdout=briefing)
                     action, obs, self._state, self._snapshot, reward_ctx
                 )
         except Exception as exc:
+            logger.error("Reward computation failed: %s", exc, exc_info=True)
             obs.reward = 0.0
         return obs

src/open_range/training/rollout.py CHANGED Viewed

@@ -14,7 +14,7 @@ Usage with GRPOTrainer::
 from __future__ import annotations
-from typing import Any, Callable, Protocol
 class AgentCallable(Protocol):
@@ -23,7 +23,7 @@ class AgentCallable(Protocol):
     def __call__(self, observation: Any) -> Any: ...
-async def rollout_func(
     env: Any,
     agent: AgentCallable,
     num_steps: int = 100,
@@ -82,10 +82,8 @@ def rollout_func_sync(
     num_steps: int = 100,
     mode: str = "red",
 ) -> dict[str, Any]:
-    """Synchronous wrapper around the async rollout function.
-    For use with training loops that don't support async.
     """
-    import asyncio
-    return asyncio.run(rollout_func(env, agent, num_steps, mode))

 from __future__ import annotations
+from typing import Any, Protocol
 class AgentCallable(Protocol):
     def __call__(self, observation: Any) -> Any: ...
+def rollout_func(
     env: Any,
     agent: AgentCallable,
     num_steps: int = 100,
     num_steps: int = 100,
     mode: str = "red",
 ) -> dict[str, Any]:
+    """Synchronous wrapper — now just delegates to rollout_func directly.
+    Kept for backward compatibility with callers that import this name.
     """
+    return rollout_func(env, agent, num_steps, mode)

tests/test_apply_snapshot.py ADDED Viewed

	@@ -0,0 +1,457 @@

+"""Tests for RangeEnvironment._apply_snapshot() with mocked Docker.
+Covers file deployment via docker exec (base64 encoding), SQL execution,
+container name resolution, error handling, and mixed files dicts.
+"""
+from __future__ import annotations
+import base64
+from unittest.mock import MagicMock, call, patch
+import pytest
+from open_range.protocols import (
+    FlagSpec,
+    SnapshotSpec,
+    TruthGraph,
+    Vulnerability,
+)
+from open_range.server.environment import RangeEnvironment
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+def _make_env(docker_available: bool = True) -> RangeEnvironment:
+    """Create a RangeEnvironment with docker_available control."""
+    return RangeEnvironment(docker_available=docker_available)
+def _make_snapshot(files: dict[str, str] | None = None) -> SnapshotSpec:
+    """Create a minimal SnapshotSpec with the given files dict."""
+    return SnapshotSpec(
+        topology={"hosts": ["web", "db"], "zones": {"dmz": ["web"], "internal": ["db"]}},
+        truth_graph=TruthGraph(vulns=[]),
+        flags=[],
+        golden_path=[],
+        files=files or {},
+    )
+class _FakeExecResult:
+    """Mimics docker SDK exec_run return value."""
+    def __init__(self, stdout: bytes = b"", stderr: bytes = b""):
+        self.output = (stdout, stderr)
+class _FakeContainer:
+    """Minimal fake Docker container."""
+    def __init__(self, name: str, exec_side_effect=None):
+        self.name = name
+        self._exec_side_effect = exec_side_effect or (lambda *a, **kw: _FakeExecResult())
+    def exec_run(self, cmd, **kwargs):
+        return self._exec_side_effect(cmd, **kwargs)
+class _FakeDockerClient:
+    """Minimal fake Docker client."""
+    def __init__(self, containers: dict[str, _FakeContainer] | None = None):
+        self._containers = containers or {}
+    @property
+    def containers(self):
+        return self
+    def get(self, name: str):
+        if name in self._containers:
+            return self._containers[name]
+        raise Exception(f"Container {name} not found")
+    def list(self):
+        return list(self._containers.values())
+# ---------------------------------------------------------------------------
+# Tests: Docker unavailable
+# ---------------------------------------------------------------------------
+class TestApplySnapshotNoDocker:
+    """When Docker is not available, _apply_snapshot should be a no-op."""
+    def test_skips_when_docker_unavailable(self):
+        env = _make_env(docker_available=False)
+        snapshot = _make_snapshot({"web:/var/www/test.php": "<?php echo 1; ?>"})
+        # Should not raise
+        env._apply_snapshot(snapshot)
+    def test_skips_when_no_files(self):
+        env = _make_env(docker_available=False)
+        snapshot = _make_snapshot({})
+        env._apply_snapshot(snapshot)
+    def test_skips_when_files_is_none(self):
+        env = _make_env(docker_available=False)
+        snapshot = _make_snapshot()
+        snapshot.files = {}
+        env._apply_snapshot(snapshot)
+# ---------------------------------------------------------------------------
+# Tests: File deployment via base64
+# ---------------------------------------------------------------------------
+class TestFileDeployment:
+    """Verify files are deployed to containers via base64-encoded docker exec."""
+    def test_deploys_single_file(self):
+        env = _make_env(docker_available=True)
+        content = "<?php echo 'hello'; ?>"
+        snapshot = _make_snapshot({"web:/var/www/portal/test.php": content})
+        exec_calls = []
+        def fake_exec_run(cmd, **kw):
+            exec_calls.append(cmd)
+            return _FakeExecResult()
+        container = _FakeContainer("web", exec_side_effect=fake_exec_run)
+        client = _FakeDockerClient({"web": container})
+        env._docker_client = client
+        env._docker_available = True
+        env._apply_snapshot(snapshot)
+        # Should have 2 calls: mkdir -p, then echo base64 | base64 -d > path
+        assert len(exec_calls) == 2
+        # First call: mkdir -p for parent directory
+        mkdir_cmd = exec_calls[0]
+        assert mkdir_cmd == ["sh", "-c", "mkdir -p '/var/www/portal'"]
+        # Second call: base64 write
+        write_cmd = exec_calls[1]
+        assert isinstance(write_cmd, list)
+        write_str = write_cmd[2] if len(write_cmd) > 2 else ""
+        expected_b64 = base64.b64encode(content.encode()).decode()
+        assert expected_b64 in write_str
+        assert "/var/www/portal/test.php" in write_str
+    def test_deploys_multiple_files_to_different_containers(self):
+        env = _make_env(docker_available=True)
+        snapshot = _make_snapshot({
+            "web:/var/www/portal/index.php": "<?php echo 'web'; ?>",
+            "files:/srv/shares/general/notes.txt": "some notes",
+        })
+        web_calls = []
+        files_calls = []
+        web = _FakeContainer(
+            "web",
+            exec_side_effect=lambda cmd, **kw: (web_calls.append(cmd), _FakeExecResult())[1],
+        )
+        files_container = _FakeContainer(
+            "files",
+            exec_side_effect=lambda cmd, **kw: (files_calls.append(cmd), _FakeExecResult())[1],
+        )
+        client = _FakeDockerClient({"web": web, "files": files_container})
+        env._docker_client = client
+        env._docker_available = True
+        env._apply_snapshot(snapshot)
+        # web: 2 calls (mkdir + write)
+        assert len(web_calls) == 2
+        # files: 2 calls (mkdir + write)
+        assert len(files_calls) == 2
+    def test_file_at_root_path(self):
+        """File at / should still work (edge case for parent dir)."""
+        env = _make_env(docker_available=True)
+        snapshot = _make_snapshot({"web:/test.txt": "root file"})
+        calls = []
+        container = _FakeContainer(
+            "web",
+            exec_side_effect=lambda cmd, **kw: (calls.append(cmd), _FakeExecResult())[1],
+        )
+        client = _FakeDockerClient({"web": container})
+        env._docker_client = client
+        env._docker_available = True
+        env._apply_snapshot(snapshot)
+        # mkdir -p for "/" then base64 write
+        assert len(calls) == 2
+# ---------------------------------------------------------------------------
+# Tests: SQL execution via docker exec
+# ---------------------------------------------------------------------------
+class TestSQLDeployment:
+    """Verify db:sql entries are deployed via mysql commands."""
+    def test_deploys_sql_to_db_container(self):
+        env = _make_env(docker_available=True)
+        sql = "INSERT INTO users VALUES (1, 'test');"
+        snapshot = _make_snapshot({"db:sql": sql})
+        calls = []
+        def fake_exec(cmd, **kw):
+            calls.append(cmd)
+            return _FakeExecResult()
+        db_container = _FakeContainer("db", exec_side_effect=fake_exec)
+        client = _FakeDockerClient({"db": db_container})
+        env._docker_client = client
+        env._docker_available = True
+        env._apply_snapshot(snapshot)
+        # 3 calls: write SQL file, execute mysql, cleanup
+        assert len(calls) == 3
+        # First: base64 decode to /tmp/_snapshot.sql
+        write_cmd_str = calls[0][2] if len(calls[0]) > 2 else ""
+        expected_b64 = base64.b64encode(sql.encode()).decode()
+        assert expected_b64 in write_cmd_str
+        assert "/tmp/_snapshot.sql" in write_cmd_str
+        # Second: mysql < /tmp/_snapshot.sql
+        mysql_cmd_str = calls[1][2] if len(calls[1]) > 2 else ""
+        assert "mysql" in mysql_cmd_str
+        assert "/tmp/_snapshot.sql" in mysql_cmd_str
+        # Third: rm -f /tmp/_snapshot.sql
+        rm_cmd_str = calls[2][2] if len(calls[2]) > 2 else ""
+        assert "rm" in rm_cmd_str
+        assert "/tmp/_snapshot.sql" in rm_cmd_str
+    def test_sql_error_logs_warning(self, caplog):
+        """When mysql returns an ERROR, it should log a warning but not raise."""
+        env = _make_env(docker_available=True)
+        snapshot = _make_snapshot({"db:sql": "INVALID SQL;"})
+        call_count = [0]
+        def fake_exec(cmd, **kw):
+            call_count[0] += 1
+            # Return ERROR on the mysql command (2nd call)
+            if call_count[0] == 2:
+                return _FakeExecResult(stderr=b"ERROR 1064: Syntax error")
+            return _FakeExecResult()
+        db_container = _FakeContainer("db", exec_side_effect=fake_exec)
+        client = _FakeDockerClient({"db": db_container})
+        env._docker_client = client
+        env._docker_available = True
+        import logging
+        with caplog.at_level(logging.WARNING):
+            env._apply_snapshot(snapshot)
+        assert any("SQL deployment error" in r.message for r in caplog.records)
+# ---------------------------------------------------------------------------
+# Tests: Container name resolution
+# ---------------------------------------------------------------------------
+class TestContainerNameResolution:
+    """Verify _container_name resolves hosts correctly."""
+    def test_resolves_via_compose_config(self):
+        env = _make_env(docker_available=False)
+        env._snapshot = SnapshotSpec(
+            topology={},
+            compose={
+                "services": {"web": {}, "db": {}},
+                "x-project-name": "openrange",
+            },
+        )
+        assert env._container_name("web") == "openrange-web-1"
+        assert env._container_name("db") == "openrange-db-1"
+    def test_resolves_via_docker_listing(self):
+        env = _make_env(docker_available=True)
+        env._snapshot = None  # No compose config
+        web_container = MagicMock()
+        web_container.name = "open-range-web-1"
+        db_container = MagicMock()
+        db_container.name = "open-range-db-1"
+        client = MagicMock()
+        client.containers.list.return_value = [web_container, db_container]
+        env._docker_client = client
+        assert env._container_name("web") == "open-range-web-1"
+        assert env._container_name("db") == "open-range-db-1"
+    def test_falls_back_to_bare_name(self):
+        env = _make_env(docker_available=False)
+        env._snapshot = None
+        assert env._container_name("web") == "web"
+# ---------------------------------------------------------------------------
+# Tests: Error handling for failed docker exec
+# ---------------------------------------------------------------------------
+class TestErrorHandling:
+    """Verify graceful handling of docker exec failures."""
+    def test_file_deployment_handles_exception(self, caplog):
+        """If docker exec raises, log warning but continue."""
+        env = _make_env(docker_available=True)
+        snapshot = _make_snapshot({
+            "web:/var/www/good.php": "good",
+            "broken:/var/www/fail.php": "bad",
+        })
+        def fake_exec(cmd, **kw):
+            return _FakeExecResult()
+        web = _FakeContainer("web", exec_side_effect=fake_exec)
+        # 'broken' container doesn't exist
+        client = _FakeDockerClient({"web": web})
+        env._docker_client = client
+        env._docker_available = True
+        import logging
+        with caplog.at_level(logging.WARNING):
+            env._apply_snapshot(snapshot)
+        # Should deploy the good file and warn about the broken one
+        assert any("Failed to deploy" in r.message or "broken" in r.message
+                    for r in caplog.records)
+    def test_bad_key_format_skipped(self, caplog):
+        """Keys without ':' separator should be skipped with a warning."""
+        env = _make_env(docker_available=True)
+        snapshot = _make_snapshot({
+            "no_colon_here": "this should be skipped",
+            "web:/var/www/valid.php": "valid content",
+        })
+        calls = []
+        web = _FakeContainer(
+            "web",
+            exec_side_effect=lambda cmd, **kw: (calls.append(cmd), _FakeExecResult())[1],
+        )
+        client = _FakeDockerClient({"web": web})
+        env._docker_client = client
+        env._docker_available = True
+        import logging
+        with caplog.at_level(logging.WARNING):
+            env._apply_snapshot(snapshot)
+        assert any("bad key format" in r.message for r in caplog.records)
+        # Only valid file should be deployed (mkdir + write = 2 calls)
+        assert len(calls) == 2
+    def test_file_write_stderr_error_logged(self, caplog):
+        """If file write returns stderr with 'Error', log warning."""
+        env = _make_env(docker_available=True)
+        snapshot = _make_snapshot({"web:/var/www/fail.php": "content"})
+        call_count = [0]
+        def fake_exec(cmd, **kw):
+            call_count[0] += 1
+            # Return error on the write call (2nd call)
+            if call_count[0] == 2:
+                return _FakeExecResult(stderr=b"Error: permission denied")
+            return _FakeExecResult()
+        web = _FakeContainer("web", exec_side_effect=fake_exec)
+        client = _FakeDockerClient({"web": web})
+        env._docker_client = client
+        env._docker_available = True
+        import logging
+        with caplog.at_level(logging.WARNING):
+            env._apply_snapshot(snapshot)
+        assert any("File deployment error" in r.message for r in caplog.records)
+# ---------------------------------------------------------------------------
+# Tests: Mixed files dict (file paths + db:sql entries)
+# ---------------------------------------------------------------------------
+class TestMixedFilesDict:
+    """Test snapshot with both regular file deployments and db:sql entries."""
+    def test_mixed_deployment(self):
+        env = _make_env(docker_available=True)
+        snapshot = _make_snapshot({
+            "web:/var/www/portal/index.php": "<?php echo 'hello'; ?>",
+            "web:/etc/nginx/sites-available/default": "server { listen 80; }",
+            "db:sql": "INSERT INTO secrets VALUES ('flag', 'FLAG{test}');",
+            "files:/srv/shares/general/notes.txt": "meeting notes",
+        })
+        container_calls: dict[str, list] = {"web": [], "db": [], "files": []}
+        def make_exec(name):
+            def fake_exec(cmd, **kw):
+                container_calls[name].append(cmd)
+                return _FakeExecResult()
+            return fake_exec
+        containers = {
+            name: _FakeContainer(name, exec_side_effect=make_exec(name))
+            for name in ["web", "db", "files"]
+        }
+        client = _FakeDockerClient(containers)
+        env._docker_client = client
+        env._docker_available = True
+        env._apply_snapshot(snapshot)
+        # web: 2 files * 2 calls each = 4
+        assert len(container_calls["web"]) == 4
+        # db: 3 calls (write sql, execute, cleanup)
+        assert len(container_calls["db"]) == 3
+        # files: 1 file * 2 calls = 2
+        assert len(container_calls["files"]) == 2
+    def test_deployment_count_in_log(self, caplog):
+        """Verify the final log message reports correct deployment counts."""
+        env = _make_env(docker_available=True)
+        snapshot = _make_snapshot({
+            "web:/var/www/test.php": "test",
+            "db:sql": "SELECT 1;",
+        })
+        def fake_exec(cmd, **kw):
+            return _FakeExecResult()
+        containers = {
+            name: _FakeContainer(name, exec_side_effect=fake_exec)
+            for name in ["web", "db"]
+        }
+        client = _FakeDockerClient(containers)
+        env._docker_client = client
+        env._docker_available = True
+        import logging
+        with caplog.at_level(logging.INFO):
+            env._apply_snapshot(snapshot)
+        assert any("2/2 artifacts deployed" in r.message for r in caplog.records)

tests/test_console.py CHANGED Viewed

@@ -1,7 +1,11 @@
 """Tests for the operator debugging console (issue #28).
-Uses Starlette's TestClient against the standalone FastAPI app.
 No Docker dependency.
 """
 from __future__ import annotations
@@ -10,17 +14,27 @@ import pytest
 from starlette.testclient import TestClient
 from open_range.server.app import create_app
 @pytest.fixture()
-def client(monkeypatch):
-    """Create a TestClient against the standalone FastAPI app (not OpenEnv)."""
-    # Force standalone path so we test our own endpoints and console integration
-    monkeypatch.setattr("open_range.server.app._try_openenv_app", lambda: None)
     app = create_app()
     return TestClient(app)
 # ===================================================================
 # GET /console -- HTML page
 # ===================================================================
@@ -59,8 +73,8 @@ class TestSnapshotAPI:
         data = client.get("/console/api/snapshot").json()
         assert data["id"] is None
-    def test_snapshot_after_reset(self, client: TestClient):
-        client.post("/reset", json={"episode_id": "snap_test_1"})
         data = client.get("/console/api/snapshot").json()
         assert data["id"] == "snap_test_1"
         assert "hosts" in data
@@ -68,9 +82,9 @@ class TestSnapshotAPI:
         assert "vuln_count" in data
         assert "tier" in data
-    def test_snapshot_no_truth_graph_or_flags(self, client: TestClient):
         """Snapshot API must NOT leak truth_graph or flag values."""
-        client.post("/reset", json={})
         data = client.get("/console/api/snapshot").json()
         assert "truth_graph" not in data
         assert "flags" not in data
@@ -89,20 +103,22 @@ class TestEpisodeAPI:
         data = resp.json()
         assert isinstance(data, dict)
-    def test_episode_fields(self, client: TestClient):
-        client.post("/reset", json={})
         data = client.get("/console/api/episode").json()
         assert "step_count" in data
         assert "flags_found" in data
         assert "mode" in data
         assert "services_status" in data
-    def test_episode_step_count_updates(self, client: TestClient):
-        client.post("/reset", json={})
         data = client.get("/console/api/episode").json()
         assert data["step_count"] == 0
-        client.post("/step", json={"command": "nmap web", "mode": "red"})
         data = client.get("/console/api/episode").json()
         assert data["step_count"] == 1
@@ -120,15 +136,14 @@ class TestHistoryAPI:
         assert isinstance(data, list)
     def test_history_empty_initially(self, client: TestClient):
-        # Reset clears history
-        client.post("/reset", json={})
         data = client.get("/console/api/history").json()
         assert data == []
     def test_history_records_actions(self, client: TestClient):
-        client.post("/reset", json={})
-        client.post("/step", json={"command": "nmap -sV web", "mode": "red"})
-        client.post("/step", json={"command": "tail -f /var/log/syslog", "mode": "blue"})
         data = client.get("/console/api/history").json()
         assert len(data) == 2
         # Newest first
@@ -136,8 +151,9 @@ class TestHistoryAPI:
         assert data[1]["mode"] == "red"
     def test_history_has_timestamps(self, client: TestClient):
-        client.post("/reset", json={})
-        client.post("/step", json={"command": "nmap web", "mode": "red"})
         data = client.get("/console/api/history").json()
         assert len(data) == 1
         assert "time" in data[0]
@@ -145,11 +161,9 @@ class TestHistoryAPI:
     def test_history_max_20(self, client: TestClient):
         """History API should return at most 20 entries."""
-        client.post("/reset", json={})
         for i in range(25):
-            client.post(
-                "/step",
-                json={"command": f"cmd_{i}", "mode": "red"},
-            )
         data = client.get("/console/api/history").json()
         assert len(data) == 20

 """Tests for the operator debugging console (issue #28).
+Uses Starlette's TestClient against the OpenEnv app with console router.
 No Docker dependency.
+Note: OpenEnv HTTP endpoints are stateless (each creates a new env instance).
+Console API uses a fallback env stored on app.state.  History is recorded
+via the module-level record_action() / clear_history() helpers.
 """
 from __future__ import annotations
 from starlette.testclient import TestClient
 from open_range.server.app import create_app
+from open_range.server.console import clear_history, record_action
+from open_range.server.environment import RangeEnvironment
 @pytest.fixture()
+def client():
+    """Create a TestClient with a shared env on app.state for console API."""
     app = create_app()
+    # Store a shared env so console API endpoints can access state
+    env = RangeEnvironment(docker_available=False)
+    app.state.env = env
+    clear_history()
     return TestClient(app)
+@pytest.fixture()
+def env(client: TestClient) -> RangeEnvironment:
+    """Return the shared env stored on app.state."""
+    return client.app.state.env
 # ===================================================================
 # GET /console -- HTML page
 # ===================================================================
         data = client.get("/console/api/snapshot").json()
         assert data["id"] is None
+    def test_snapshot_after_reset(self, client: TestClient, env: RangeEnvironment):
+        env.reset(episode_id="snap_test_1")
         data = client.get("/console/api/snapshot").json()
         assert data["id"] == "snap_test_1"
         assert "hosts" in data
         assert "vuln_count" in data
         assert "tier" in data
+    def test_snapshot_no_truth_graph_or_flags(self, client: TestClient, env: RangeEnvironment):
         """Snapshot API must NOT leak truth_graph or flag values."""
+        env.reset()
         data = client.get("/console/api/snapshot").json()
         assert "truth_graph" not in data
         assert "flags" not in data
         data = resp.json()
         assert isinstance(data, dict)
+    def test_episode_fields(self, client: TestClient, env: RangeEnvironment):
+        env.reset()
         data = client.get("/console/api/episode").json()
         assert "step_count" in data
         assert "flags_found" in data
         assert "mode" in data
         assert "services_status" in data
+    def test_episode_step_count_updates(self, client: TestClient, env: RangeEnvironment):
+        from open_range.server.models import RangeAction
+        env.reset()
         data = client.get("/console/api/episode").json()
         assert data["step_count"] == 0
+        env.step(RangeAction(command="nmap web", mode="red"))
         data = client.get("/console/api/episode").json()
         assert data["step_count"] == 1
         assert isinstance(data, list)
     def test_history_empty_initially(self, client: TestClient):
         data = client.get("/console/api/history").json()
         assert data == []
     def test_history_records_actions(self, client: TestClient):
+        import time
+        record_action({"step": 1, "command": "nmap -sV web", "mode": "red", "time": time.time()})
+        record_action({"step": 2, "command": "tail -f /var/log/syslog", "mode": "blue", "time": time.time()})
         data = client.get("/console/api/history").json()
         assert len(data) == 2
         # Newest first
         assert data[1]["mode"] == "red"
     def test_history_has_timestamps(self, client: TestClient):
+        import time
+        record_action({"step": 1, "command": "nmap web", "mode": "red", "time": time.time()})
         data = client.get("/console/api/history").json()
         assert len(data) == 1
         assert "time" in data[0]
     def test_history_max_20(self, client: TestClient):
         """History API should return at most 20 entries."""
+        import time
         for i in range(25):
+            record_action({"step": i, "command": f"cmd_{i}", "mode": "red", "time": time.time()})
         data = client.get("/console/api/history").json()
         assert len(data) == 20

tests/test_parse_llm_response.py ADDED Viewed

	@@ -0,0 +1,1075 @@

+"""Tests for _parse_llm_response() — the critical LLM JSON -> SnapshotSpec mapper.
+Covers field name aliases, evidence spec formats, NPC persona parsing,
+files dict extraction, missing/minimal/malformed input, and a real LLM
+output fixture from snapshots/llm_tier1_test.json.
+"""
+import json
+from pathlib import Path
+import pytest
+from open_range.builder.builder import _parse_llm_response
+from open_range.protocols import (
+    EvidenceItem,
+    ExploitStep,
+    FlagSpec,
+    GoldenPathStep,
+    NPCPersona,
+    SnapshotSpec,
+    Vulnerability,
+)
+ROOT = Path(__file__).parent.parent
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+def _minimal_json(**overrides) -> str:
+    """Return a minimal valid JSON string for _parse_llm_response.
+    All top-level keys present but with empty/default values unless overridden.
+    """
+    base: dict = {
+        "topology": {},
+        "truth_graph": {"vulns": [], "exploit_chain": []},
+        "golden_path": [],
+        "flags": [],
+        "evidence_spec": {},
+        "npc_personas": [],
+        "npc_traffic": {},
+        "task": {},
+    }
+    base.update(overrides)
+    return json.dumps(base)
+# ---------------------------------------------------------------------------
+# 1. Happy path with real LLM output
+# ---------------------------------------------------------------------------
+class TestRealLLMOutput:
+    """Parse the actual LLM-generated JSON from snapshots/llm_tier1_test.json."""
+    @pytest.fixture
+    def llm_json(self):
+        path = ROOT / "snapshots" / "llm_tier1_test.json"
+        return path.read_text()
+    def test_parses_to_snapshot_spec(self, llm_json):
+        spec = _parse_llm_response(llm_json)
+        assert isinstance(spec, SnapshotSpec)
+    def test_topology_hosts(self, llm_json):
+        spec = _parse_llm_response(llm_json)
+        hosts = spec.topology["hosts"]
+        assert "attacker" in hosts
+        assert "web" in hosts
+        assert "siem" in hosts
+        assert len(hosts) == 8
+    def test_topology_zones(self, llm_json):
+        spec = _parse_llm_response(llm_json)
+        zones = spec.topology["zones"]
+        assert "dmz" in zones
+        assert "web" in zones["dmz"]
+    def test_truth_graph_vulns(self, llm_json):
+        spec = _parse_llm_response(llm_json)
+        assert len(spec.truth_graph.vulns) == 2
+        vuln_ids = {v.id for v in spec.truth_graph.vulns}
+        assert "VULN-SQLI-LOOKUP" in vuln_ids
+        assert "VULN-MISSING-AUTHZ-REPORT" in vuln_ids
+    def test_truth_graph_exploit_chain(self, llm_json):
+        spec = _parse_llm_response(llm_json)
+        assert len(spec.truth_graph.exploit_chain) == 2
+        assert spec.truth_graph.exploit_chain[0].vuln_id == "VULN-SQLI-LOOKUP"
+    def test_flags(self, llm_json):
+        spec = _parse_llm_response(llm_json)
+        assert len(spec.flags) == 2
+        values = {f.value for f in spec.flags}
+        assert "FLAG{9f3a2b4c5d6e7f80}" in values
+        assert "FLAG{1a2b3c4d5e6f7788}" in values
+    def test_golden_path(self, llm_json):
+        spec = _parse_llm_response(llm_json)
+        assert len(spec.golden_path) == 8
+        # The real LLM output uses "cmd" field name
+        assert spec.golden_path[0].command == "nmap -p 80 10.0.1.10"
+        assert spec.golden_path[0].expect_in_stdout == "80/tcp open"
+    def test_task_briefings(self, llm_json):
+        spec = _parse_llm_response(llm_json)
+        assert "Meridian" in spec.task.red_briefing
+        assert spec.task.blue_briefing != ""
+    def test_npc_personas(self, llm_json):
+        spec = _parse_llm_response(llm_json)
+        assert len(spec.npc_personas) == 8
+        names = {p.name for p in spec.npc_personas}
+        assert "Derek Thompson" in names
+        assert "Karen Williams" in names
+    def test_npc_persona_security_awareness(self, llm_json):
+        spec = _parse_llm_response(llm_json)
+        by_name = {p.name: p for p in spec.npc_personas}
+        assert by_name["Derek Thompson"].security_awareness == 0.85
+        assert by_name["Karen Williams"].security_awareness == 0.25
+    def test_files_dict(self, llm_json):
+        spec = _parse_llm_response(llm_json)
+        # Real LLM output has explicit files + vulnerable_code dicts
+        assert len(spec.files) > 0
+        assert "web:/var/www/portal/lookup.php" in spec.files
+        assert "web:/var/www/portal/admin/compliance_report.php" in spec.files
+    def test_vulnerable_code_as_dict_extracted_to_files(self, llm_json):
+        spec = _parse_llm_response(llm_json)
+        # The VULN-SQLI-LOOKUP has vulnerable_code as dict with key
+        # /var/www/portal/lookup.php. It should be extracted to files
+        # as "web:/var/www/portal/lookup.php".
+        # But the explicit files dict already has this key, so the
+        # explicit one takes precedence (container_key not in files check).
+        assert "web:/var/www/portal/lookup.php" in spec.files
+# ---------------------------------------------------------------------------
+# 2. Field name mappings (ExploitStep aliases)
+# ---------------------------------------------------------------------------
+class TestExploitStepFieldMappings:
+    """LLM uses vuln/action/yields; Pydantic expects vuln_id/command/description."""
+    def test_vuln_maps_to_vuln_id(self):
+        raw = _minimal_json(
+            truth_graph={
+                "vulns": [],
+                "exploit_chain": [
+                    {"vuln": "V1", "action": "run exploit", "yields": "root shell"}
+                ],
+            }
+        )
+        spec = _parse_llm_response(raw)
+        assert spec.truth_graph.exploit_chain[0].vuln_id == "V1"
+    def test_action_maps_to_command(self):
+        raw = _minimal_json(
+            truth_graph={
+                "vulns": [],
+                "exploit_chain": [
+                    {"vuln": "V1", "action": "sqlmap -u http://...", "yields": "db dump"}
+                ],
+            }
+        )
+        spec = _parse_llm_response(raw)
+        assert spec.truth_graph.exploit_chain[0].command == "sqlmap -u http://..."
+    def test_yields_maps_to_description(self):
+        raw = _minimal_json(
+            truth_graph={
+                "vulns": [],
+                "exploit_chain": [
+                    {"vuln": "V1", "action": "cmd", "yields": "got credentials"}
+                ],
+            }
+        )
+        spec = _parse_llm_response(raw)
+        assert spec.truth_graph.exploit_chain[0].description == "got credentials"
+    def test_canonical_names_also_work(self):
+        """vuln_id/command/description should pass through without aliasing."""
+        raw = _minimal_json(
+            truth_graph={
+                "vulns": [],
+                "exploit_chain": [
+                    {
+                        "vuln_id": "V2",
+                        "command": "nmap -sV ...",
+                        "description": "port scan",
+                    }
+                ],
+            }
+        )
+        spec = _parse_llm_response(raw)
+        ec = spec.truth_graph.exploit_chain[0]
+        assert ec.vuln_id == "V2"
+        assert ec.command == "nmap -sV ..."
+        assert ec.description == "port scan"
+    def test_canonical_names_take_precedence(self):
+        """When both canonical and alias are present, canonical wins (via get order)."""
+        raw = _minimal_json(
+            truth_graph={
+                "vulns": [],
+                "exploit_chain": [
+                    {
+                        "vuln_id": "canonical",
+                        "vuln": "alias",
+                        "command": "canonical_cmd",
+                        "action": "alias_cmd",
+                        "description": "canonical_desc",
+                        "yields": "alias_desc",
+                    }
+                ],
+            }
+        )
+        spec = _parse_llm_response(raw)
+        ec = spec.truth_graph.exploit_chain[0]
+        assert ec.vuln_id == "canonical"
+        assert ec.command == "canonical_cmd"
+        assert ec.description == "canonical_desc"
+# ---------------------------------------------------------------------------
+# 3. GoldenPathStep field mappings
+# ---------------------------------------------------------------------------
+class TestGoldenPathFieldMappings:
+    """LLM uses cmd/expect_stdout; Pydantic expects command/expect_in_stdout."""
+    def test_cmd_maps_to_command(self):
+        raw = _minimal_json(
+            golden_path=[
+                {"step": 1, "cmd": "nmap -sV 10.0.1.0/24", "expect_stdout": "open"}
+            ]
+        )
+        spec = _parse_llm_response(raw)
+        assert spec.golden_path[0].command == "nmap -sV 10.0.1.0/24"
+    def test_expect_stdout_maps_to_expect_in_stdout(self):
+        raw = _minimal_json(
+            golden_path=[
+                {"step": 1, "cmd": "whoami", "expect_stdout": "root"}
+            ]
+        )
+        spec = _parse_llm_response(raw)
+        assert spec.golden_path[0].expect_in_stdout == "root"
+    def test_canonical_command_field(self):
+        raw = _minimal_json(
+            golden_path=[
+                {"step": 1, "command": "ls -la", "expect_in_stdout": "total"}
+            ]
+        )
+        spec = _parse_llm_response(raw)
+        assert spec.golden_path[0].command == "ls -la"
+        assert spec.golden_path[0].expect_in_stdout == "total"
+    def test_mixed_field_names_across_steps(self):
+        """Some steps use cmd, others use command — both should parse."""
+        raw = _minimal_json(
+            golden_path=[
+                {"step": 1, "cmd": "nmap scan", "expect_stdout": "80/tcp"},
+                {"step": 2, "command": "curl http://web", "expect_in_stdout": "Welcome"},
+                {"step": 3, "cmd": "sqlmap", "expect_in_stdout": "FLAG"},
+            ]
+        )
+        spec = _parse_llm_response(raw)
+        assert len(spec.golden_path) == 3
+        assert spec.golden_path[0].command == "nmap scan"
+        assert spec.golden_path[0].expect_in_stdout == "80/tcp"
+        assert spec.golden_path[1].command == "curl http://web"
+        assert spec.golden_path[1].expect_in_stdout == "Welcome"
+        assert spec.golden_path[2].command == "sqlmap"
+        assert spec.golden_path[2].expect_in_stdout == "FLAG"
+    def test_step_number_preserved(self):
+        raw = _minimal_json(
+            golden_path=[
+                {"step": 5, "cmd": "echo hi", "expect_stdout": "hi"}
+            ]
+        )
+        spec = _parse_llm_response(raw)
+        assert spec.golden_path[0].step == 5
+    def test_description_field_preserved(self):
+        raw = _minimal_json(
+            golden_path=[
+                {
+                    "step": 1,
+                    "cmd": "nmap",
+                    "expect_stdout": "open",
+                    "description": "Port scan the DMZ",
+                }
+            ]
+        )
+        spec = _parse_llm_response(raw)
+        assert spec.golden_path[0].description == "Port scan the DMZ"
+    def test_cmd_takes_precedence_over_command(self):
+        """When both cmd and command are present, cmd wins (it's checked first)."""
+        raw = _minimal_json(
+            golden_path=[
+                {
+                    "step": 1,
+                    "cmd": "cmd_value",
+                    "command": "command_value",
+                    "expect_stdout": "x",
+                }
+            ]
+        )
+        spec = _parse_llm_response(raw)
+        assert spec.golden_path[0].command == "cmd_value"
+# ---------------------------------------------------------------------------
+# 4. Evidence spec parsing
+# ---------------------------------------------------------------------------
+class TestEvidenceSpecParsing:
+    """LLM returns dict, protocol expects list[EvidenceItem]."""
+    def test_dict_with_string_values(self):
+        raw = _minimal_json(
+            evidence_spec={
+                "web_access_log": "SQL injection pattern",
+                "siem_alerts": "Unauthorized access",
+            }
+        )
+        spec = _parse_llm_response(raw)
+        assert len(spec.evidence_spec) == 2
+        locations = {e.location for e in spec.evidence_spec}
+        assert "web_access_log" in locations
+        assert "siem_alerts" in locations
+        # String values become log_entry type
+        for e in spec.evidence_spec:
+            if e.location == "web_access_log":
+                assert e.type == "log_entry"
+                assert e.pattern == "SQL injection pattern"
+    def test_dict_with_list_values(self):
+        raw = _minimal_json(
+            evidence_spec={
+                "siem_alerts": ["UNION SELECT detected", "admin endpoint accessed"],
+            }
+        )
+        spec = _parse_llm_response(raw)
+        assert len(spec.evidence_spec) == 2
+        # List values become alert type
+        for e in spec.evidence_spec:
+            assert e.type == "alert"
+            assert e.location == "siem_alerts"
+        patterns = {e.pattern for e in spec.evidence_spec}
+        assert "UNION SELECT detected" in patterns
+        assert "admin endpoint accessed" in patterns
+    def test_dict_with_mixed_values(self):
+        raw = _minimal_json(
+            evidence_spec={
+                "web_log": "GET /search?q=",
+                "alerts": ["sqli_detected", "auth_bypass"],
+            }
+        )
+        spec = _parse_llm_response(raw)
+        assert len(spec.evidence_spec) == 3  # 1 string + 2 list items
+    def test_list_format_passthrough(self):
+        """When evidence_spec is already a list of dicts, parse directly."""
+        raw = _minimal_json(
+            evidence_spec=[
+                {"type": "alert", "location": "siem", "pattern": "SQLi"},
+                {"type": "log_entry", "location": "web_log", "pattern": "GET /admin"},
+            ]
+        )
+        spec = _parse_llm_response(raw)
+        assert len(spec.evidence_spec) == 2
+        assert spec.evidence_spec[0].type == "alert"
+        assert spec.evidence_spec[1].location == "web_log"
+    def test_empty_dict(self):
+        raw = _minimal_json(evidence_spec={})
+        spec = _parse_llm_response(raw)
+        assert spec.evidence_spec == []
+    def test_empty_list(self):
+        raw = _minimal_json(evidence_spec=[])
+        spec = _parse_llm_response(raw)
+        assert spec.evidence_spec == []
+# ---------------------------------------------------------------------------
+# 5. NPC persona parsing
+# ---------------------------------------------------------------------------
+class TestNPCPersonaParsing:
+    def test_basic_persona(self):
+        raw = _minimal_json(
+            npc_personas=[
+                {
+                    "name": "Alice",
+                    "role": "Admin",
+                    "department": "IT",
+                    "security_awareness": 0.9,
+                }
+            ]
+        )
+        spec = _parse_llm_response(raw)
+        assert len(spec.npc_personas) == 1
+        p = spec.npc_personas[0]
+        assert p.name == "Alice"
+        assert p.role == "Admin"
+        assert p.department == "IT"
+        assert p.security_awareness == 0.9
+    def test_accounts_with_string_values(self):
+        raw = _minimal_json(
+            npc_personas=[
+                {
+                    "name": "Bob",
+                    "accounts": {
+                        "email": "bob@corp.local",
+                        "ldap_dn": "cn=bob,dc=corp,dc=local",
+                    },
+                }
+            ]
+        )
+        spec = _parse_llm_response(raw)
+        assert spec.npc_personas[0].accounts["email"] == "bob@corp.local"
+    def test_default_security_awareness(self):
+        """Missing security_awareness defaults to 0.5."""
+        raw = _minimal_json(npc_personas=[{"name": "Charlie"}])
+        spec = _parse_llm_response(raw)
+        assert spec.npc_personas[0].security_awareness == 0.5
+    def test_susceptibility_dict(self):
+        raw = _minimal_json(
+            npc_personas=[
+                {
+                    "name": "Diana",
+                    "susceptibility": {"phishing": 0.8, "pretexting": 0.6},
+                }
+            ]
+        )
+        spec = _parse_llm_response(raw)
+        assert spec.npc_personas[0].susceptibility["phishing"] == 0.8
+    def test_routine_dict(self):
+        raw = _minimal_json(
+            npc_personas=[
+                {
+                    "name": "Eve",
+                    "routine": {
+                        "morning": "check email",
+                        "afternoon": "process reports",
+                    },
+                }
+            ]
+        )
+        spec = _parse_llm_response(raw)
+        assert spec.npc_personas[0].routine["morning"] == "check email"
+    def test_multiple_personas(self):
+        raw = _minimal_json(
+            npc_personas=[
+                {"name": "P1", "security_awareness": 0.1},
+                {"name": "P2", "security_awareness": 0.5},
+                {"name": "P3", "security_awareness": 0.9},
+            ]
+        )
+        spec = _parse_llm_response(raw)
+        assert len(spec.npc_personas) == 3
+        names = [p.name for p in spec.npc_personas]
+        assert names == ["P1", "P2", "P3"]
+    def test_missing_optional_fields_default(self):
+        """All optional fields should default gracefully."""
+        raw = _minimal_json(npc_personas=[{"name": "Minimal"}])
+        spec = _parse_llm_response(raw)
+        p = spec.npc_personas[0]
+        assert p.name == "Minimal"
+        assert p.role == ""
+        assert p.department == ""
+        assert p.reports_to == ""
+        assert p.communication_style == ""
+        assert p.susceptibility == {}
+        assert p.routine == {}
+        assert p.accounts == {}
+# ---------------------------------------------------------------------------
+# 6. Files dict extraction
+# ---------------------------------------------------------------------------
+class TestFilesDictExtraction:
+    def test_explicit_files_field(self):
+        raw = _minimal_json(
+            files={
+                "web:/var/www/index.php": "<?php echo 'hello'; ?>",
+                "db:/opt/init.sql": "CREATE TABLE t(id INT);",
+            }
+        )
+        spec = _parse_llm_response(raw)
+        assert len(spec.files) == 2
+        assert spec.files["web:/var/www/index.php"] == "<?php echo 'hello'; ?>"
+    def test_vulnerable_code_dict_extracted(self):
+        """vulnerable_code as {file_path: code} should be extracted to files."""
+        raw = _minimal_json(
+            truth_graph={
+                "vulns": [
+                    {
+                        "id": "v1",
+                        "type": "sqli",
+                        "host": "web",
+                        "service": "php",
+                        "injection_point": "/search",
+                        "vulnerable_code": {
+                            "/var/www/search.php": "<?php $q=$_GET['q']; ?>"
+                        },
+                    }
+                ],
+                "exploit_chain": [],
+            }
+        )
+        spec = _parse_llm_response(raw)
+        assert "web:/var/www/search.php" in spec.files
+        assert spec.files["web:/var/www/search.php"] == "<?php $q=$_GET['q']; ?>"
+    def test_vulnerable_code_string_on_web_host(self):
+        """String vulnerable_code on web host with / injection_point goes to web:/var/www/portal{ip}."""
+        raw = _minimal_json(
+            truth_graph={
+                "vulns": [
+                    {
+                        "id": "v1",
+                        "type": "sqli",
+                        "host": "web",
+                        "service": "php",
+                        "injection_point": "/search.php",
+                        "vulnerable_code": "<?php echo 'vuln'; ?>",
+                    }
+                ],
+                "exploit_chain": [],
+            }
+        )
+        spec = _parse_llm_response(raw)
+        assert "web:/var/www/portal/search.php" in spec.files
+    def test_vulnerable_code_string_non_web_host_skipped(self):
+        """String vulnerable_code on non-web host without / prefix is not extracted."""
+        raw = _minimal_json(
+            truth_graph={
+                "vulns": [
+                    {
+                        "id": "v1",
+                        "type": "weak_creds",
+                        "host": "db",
+                        "service": "mysql",
+                        "injection_point": "mysql -u root -proot",
+                        "vulnerable_code": "",
+                    }
+                ],
+                "exploit_chain": [],
+            }
+        )
+        spec = _parse_llm_response(raw)
+        assert len(spec.files) == 0
+    def test_explicit_files_not_overwritten_by_vulnerable_code(self):
+        """If explicit files has a key, vulnerable_code should not overwrite it."""
+        raw = _minimal_json(
+            files={"web:/var/www/search.php": "explicit content"},
+            truth_graph={
+                "vulns": [
+                    {
+                        "id": "v1",
+                        "type": "sqli",
+                        "host": "web",
+                        "service": "php",
+                        "injection_point": "/search",
+                        "vulnerable_code": {
+                            "/var/www/search.php": "vulnerable content"
+                        },
+                    }
+                ],
+                "exploit_chain": [],
+            },
+        )
+        spec = _parse_llm_response(raw)
+        assert spec.files["web:/var/www/search.php"] == "explicit content"
+    def test_no_files_field_produces_empty_dict(self):
+        raw = _minimal_json()
+        spec = _parse_llm_response(raw)
+        assert spec.files == {}
+    def test_files_field_non_string_values_skipped(self):
+        """Non-string values in files dict are silently skipped."""
+        raw = _minimal_json(
+            files={
+                "web:/good.php": "<?php ?>",
+                "web:/bad.php": 12345,
+                "web:/also_bad.php": ["not", "a", "string"],
+            }
+        )
+        spec = _parse_llm_response(raw)
+        assert len(spec.files) == 1
+        assert "web:/good.php" in spec.files
+# ---------------------------------------------------------------------------
+# 7. Missing optional fields
+# ---------------------------------------------------------------------------
+class TestMissingOptionalFields:
+    def test_missing_evidence_spec(self):
+        data = {
+            "topology": {},
+            "truth_graph": {"vulns": [], "exploit_chain": []},
+            "golden_path": [],
+            "flags": [],
+            "npc_personas": [],
+            "npc_traffic": {},
+            "task": {},
+        }
+        spec = _parse_llm_response(json.dumps(data))
+        assert spec.evidence_spec == []
+    def test_missing_npc_personas(self):
+        data = {
+            "topology": {},
+            "truth_graph": {"vulns": [], "exploit_chain": []},
+            "golden_path": [],
+            "flags": [],
+            "evidence_spec": {},
+            "npc_traffic": {},
+            "task": {},
+        }
+        spec = _parse_llm_response(json.dumps(data))
+        assert spec.npc_personas == []
+    def test_missing_npc_traffic(self):
+        data = {
+            "topology": {},
+            "truth_graph": {"vulns": [], "exploit_chain": []},
+            "golden_path": [],
+            "flags": [],
+            "evidence_spec": {},
+            "npc_personas": [],
+            "task": {},
+        }
+        spec = _parse_llm_response(json.dumps(data))
+        # npc_traffic gets default NPCTrafficSpec values
+        assert spec.npc_traffic.level == 0
+    def test_missing_task(self):
+        data = {
+            "topology": {},
+            "truth_graph": {"vulns": [], "exploit_chain": []},
+            "golden_path": [],
+            "flags": [],
+            "evidence_spec": {},
+            "npc_personas": [],
+            "npc_traffic": {},
+        }
+        spec = _parse_llm_response(json.dumps(data))
+        assert spec.task.red_briefing == ""
+        assert spec.task.blue_briefing == ""
+    def test_missing_truth_graph(self):
+        data = {
+            "topology": {"hosts": ["web"]},
+            "golden_path": [],
+            "flags": [],
+            "evidence_spec": {},
+            "npc_personas": [],
+            "npc_traffic": {},
+            "task": {},
+        }
+        spec = _parse_llm_response(json.dumps(data))
+        assert spec.truth_graph.vulns == []
+        assert spec.truth_graph.exploit_chain == []
+    def test_missing_golden_path(self):
+        data = {
+            "topology": {},
+            "truth_graph": {"vulns": [], "exploit_chain": []},
+            "flags": [],
+            "evidence_spec": {},
+            "npc_personas": [],
+            "npc_traffic": {},
+            "task": {},
+        }
+        spec = _parse_llm_response(json.dumps(data))
+        assert spec.golden_path == []
+    def test_missing_flags(self):
+        data = {
+            "topology": {},
+            "truth_graph": {"vulns": [], "exploit_chain": []},
+            "golden_path": [],
+            "evidence_spec": {},
+            "npc_personas": [],
+            "npc_traffic": {},
+            "task": {},
+        }
+        spec = _parse_llm_response(json.dumps(data))
+        assert spec.flags == []
+    def test_vuln_with_minimal_fields(self):
+        """A vulnerability with only id, type, host should parse fine."""
+        raw = _minimal_json(
+            truth_graph={
+                "vulns": [{"id": "v1", "type": "sqli", "host": "web"}],
+                "exploit_chain": [],
+            }
+        )
+        spec = _parse_llm_response(raw)
+        v = spec.truth_graph.vulns[0]
+        assert v.id == "v1"
+        assert v.service == ""
+        assert v.injection_point == ""
+        assert v.vulnerable_code == ""
+        assert v.root_cause == ""
+# ---------------------------------------------------------------------------
+# 8. Empty/minimal input
+# ---------------------------------------------------------------------------
+class TestMinimalInput:
+    def test_completely_empty_json_object(self):
+        """An empty JSON object should produce a valid SnapshotSpec with defaults."""
+        spec = _parse_llm_response("{}")
+        assert isinstance(spec, SnapshotSpec)
+        assert spec.topology == {}
+        assert spec.truth_graph.vulns == []
+        assert spec.golden_path == []
+        assert spec.flags == []
+        assert spec.evidence_spec == []
+        assert spec.npc_personas == []
+    def test_minimal_valid_json(self):
+        raw = _minimal_json()
+        spec = _parse_llm_response(raw)
+        assert isinstance(spec, SnapshotSpec)
+    def test_topology_only(self):
+        raw = json.dumps({"topology": {"hosts": ["web", "db"]}})
+        spec = _parse_llm_response(raw)
+        assert spec.topology["hosts"] == ["web", "db"]
+        assert spec.golden_path == []
+# ---------------------------------------------------------------------------
+# 9. Malformed input
+# ---------------------------------------------------------------------------
+class TestMalformedInput:
+    def test_invalid_json_raises(self):
+        with pytest.raises(json.JSONDecodeError):
+            _parse_llm_response("not valid json {{{")
+    def test_json_array_not_object_raises(self):
+        """Top-level must be an object, not an array."""
+        with pytest.raises((TypeError, AttributeError)):
+            _parse_llm_response("[1, 2, 3]")
+    def test_json_string_not_object_raises(self):
+        with pytest.raises((TypeError, AttributeError)):
+            _parse_llm_response('"just a string"')
+    def test_truth_graph_not_dict_handled(self):
+        """If truth_graph is a non-dict, .get() calls should fail gracefully."""
+        # truth_graph as string
+        raw = json.dumps({"truth_graph": "not a dict"})
+        # This will try .get() on a string, which fails
+        with pytest.raises(AttributeError):
+            _parse_llm_response(raw)
+    def test_golden_path_not_list_handled(self):
+        """If golden_path is a non-list iterable (e.g. string), .get() on items fails."""
+        raw = json.dumps({"golden_path": "not a list"})
+        with pytest.raises(AttributeError):
+            _parse_llm_response(raw)
+    def test_empty_string_raises(self):
+        with pytest.raises(json.JSONDecodeError):
+            _parse_llm_response("")
+    def test_json_with_trailing_comma_raises(self):
+        with pytest.raises(json.JSONDecodeError):
+            _parse_llm_response('{"key": "value",}')
+# ---------------------------------------------------------------------------
+# 10. Vulnerability parsing details
+# ---------------------------------------------------------------------------
+class TestVulnerabilityParsing:
+    def test_all_vuln_fields_parsed(self):
+        raw = _minimal_json(
+            truth_graph={
+                "vulns": [
+                    {
+                        "id": "VULN-001",
+                        "type": "sqli",
+                        "host": "web",
+                        "service": "nginx+php",
+                        "injection_point": "/search?q=",
+                        "vulnerable_code": "<?php $q=$_GET['q']; ?>",
+                        "root_cause": "No input sanitization",
+                        "blast_radius": "Full DB read",
+                        "remediation": "Use prepared statements",
+                    }
+                ],
+                "exploit_chain": [],
+            }
+        )
+        spec = _parse_llm_response(raw)
+        v = spec.truth_graph.vulns[0]
+        assert v.id == "VULN-001"
+        assert v.type == "sqli"
+        assert v.host == "web"
+        assert v.service == "nginx+php"
+        assert v.injection_point == "/search?q="
+        assert v.vulnerable_code == "<?php $q=$_GET['q']; ?>"
+        assert v.root_cause == "No input sanitization"
+        assert v.blast_radius == "Full DB read"
+        assert v.remediation == "Use prepared statements"
+    def test_vulnerable_code_as_dict(self):
+        raw = _minimal_json(
+            truth_graph={
+                "vulns": [
+                    {
+                        "id": "V1",
+                        "type": "sqli",
+                        "host": "web",
+                        "vulnerable_code": {
+                            "/var/www/search.php": "<?php vuln code; ?>"
+                        },
+                    }
+                ],
+                "exploit_chain": [],
+            }
+        )
+        spec = _parse_llm_response(raw)
+        v = spec.truth_graph.vulns[0]
+        assert isinstance(v.vulnerable_code, dict)
+        assert v.vulnerable_code["/var/www/search.php"] == "<?php vuln code; ?>"
+    def test_multiple_vulns(self):
+        raw = _minimal_json(
+            truth_graph={
+                "vulns": [
+                    {"id": "V1", "type": "sqli", "host": "web"},
+                    {"id": "V2", "type": "xss", "host": "web"},
+                    {"id": "V3", "type": "idor", "host": "web"},
+                ],
+                "exploit_chain": [],
+            }
+        )
+        spec = _parse_llm_response(raw)
+        assert len(spec.truth_graph.vulns) == 3
+        types = {v.type for v in spec.truth_graph.vulns}
+        assert types == {"sqli", "xss", "idor"}
+# ---------------------------------------------------------------------------
+# 11. Flag parsing
+# ---------------------------------------------------------------------------
+class TestFlagParsing:
+    def test_single_flag(self):
+        raw = _minimal_json(
+            flags=[
+                {
+                    "id": "flag1",
+                    "value": "FLAG{abc123}",
+                    "path": "/var/flags/flag1.txt",
+                    "host": "db",
+                }
+            ]
+        )
+        spec = _parse_llm_response(raw)
+        assert len(spec.flags) == 1
+        f = spec.flags[0]
+        assert f.id == "flag1"
+        assert f.value == "FLAG{abc123}"
+        assert f.path == "/var/flags/flag1.txt"
+        assert f.host == "db"
+    def test_multiple_flags(self):
+        raw = _minimal_json(
+            flags=[
+                {"id": "f1", "value": "FLAG{a}", "path": "/f1.txt", "host": "web"},
+                {"id": "f2", "value": "FLAG{b}", "path": "/f2.txt", "host": "db"},
+            ]
+        )
+        spec = _parse_llm_response(raw)
+        assert len(spec.flags) == 2
+    def test_missing_flag_fields_default_to_empty(self):
+        raw = _minimal_json(flags=[{}])
+        spec = _parse_llm_response(raw)
+        f = spec.flags[0]
+        assert f.id == ""
+        assert f.value == ""
+        assert f.path == ""
+        assert f.host == ""
+# ---------------------------------------------------------------------------
+# 12. NPC traffic parsing
+# ---------------------------------------------------------------------------
+class TestNPCTrafficParsing:
+    def test_http_rate_maps_to_rate_lambda(self):
+        raw = _minimal_json(npc_traffic={"http_rate": 25})
+        spec = _parse_llm_response(raw)
+        assert spec.npc_traffic.rate_lambda == 25
+    def test_default_scripts(self):
+        raw = _minimal_json(npc_traffic={})
+        spec = _parse_llm_response(raw)
+        assert "http_traffic.sh" in spec.npc_traffic.scripts
+    def test_level_always_zero(self):
+        """Current parser hardcodes level=0."""
+        raw = _minimal_json(npc_traffic={"http_rate": 50})
+        spec = _parse_llm_response(raw)
+        assert spec.npc_traffic.level == 0
+    def test_missing_http_rate_defaults_to_10(self):
+        raw = _minimal_json(npc_traffic={})
+        spec = _parse_llm_response(raw)
+        assert spec.npc_traffic.rate_lambda == 10
+# ---------------------------------------------------------------------------
+# 13. Task parsing
+# ---------------------------------------------------------------------------
+class TestTaskParsing:
+    def test_both_briefings(self):
+        raw = _minimal_json(
+            task={
+                "red_briefing": "Attack the network.",
+                "blue_briefing": "Defend the network.",
+            }
+        )
+        spec = _parse_llm_response(raw)
+        assert spec.task.red_briefing == "Attack the network."
+        assert spec.task.blue_briefing == "Defend the network."
+    def test_missing_briefings_default_empty(self):
+        raw = _minimal_json(task={})
+        spec = _parse_llm_response(raw)
+        assert spec.task.red_briefing == ""
+        assert spec.task.blue_briefing == ""
+    def test_extra_task_fields_ignored(self):
+        """Extra fields in task should be silently ignored."""
+        raw = _minimal_json(
+            task={
+                "red_briefing": "Go",
+                "blue_briefing": "Watch",
+                "unknown_field": "whatever",
+            }
+        )
+        spec = _parse_llm_response(raw)
+        assert spec.task.red_briefing == "Go"
+# ---------------------------------------------------------------------------
+# 14. Roundtrip / integration
+# ---------------------------------------------------------------------------
+class TestRoundtrip:
+    def test_complex_snapshot_parses_completely(self):
+        """A complex snapshot with all sections populated should parse."""
+        data = {
+            "topology": {
+                "hosts": ["attacker", "web", "db", "siem"],
+                "zones": {"dmz": ["web"], "internal": ["db"], "mgmt": ["siem"]},
+                "users": [{"username": "admin", "password": "pass", "groups": ["admins"], "hosts": ["web"]}],
+            },
+            "truth_graph": {
+                "vulns": [
+                    {
+                        "id": "V1",
+                        "type": "sqli",
+                        "host": "web",
+                        "service": "php",
+                        "injection_point": "/search?q=",
+                        "vulnerable_code": {"search.php": "vuln code"},
+                        "root_cause": "no sanitization",
+                        "blast_radius": "db read",
+                        "remediation": "prepared stmts",
+                    }
+                ],
+                "exploit_chain": [
+                    {"vuln": "V1", "action": "sqlmap", "yields": "db dump"}
+                ],
+            },
+            "golden_path": [
+                {"step": 1, "cmd": "nmap -sV 10.0.1.0/24", "expect_stdout": "80/tcp"},
+                {"step": 2, "command": "curl http://web/search?q=test", "expect_in_stdout": "results"},
+            ],
+            "flags": [
+                {"id": "f1", "value": "FLAG{complex}", "path": "/flag.txt", "host": "db"}
+            ],
+            "evidence_spec": {
+                "web_log": "sqli pattern",
+                "alerts": ["sql_injection_detected"],
+            },
+            "npc_personas": [
+                {
+                    "name": "Alice",
+                    "role": "SysAdmin",
+                    "department": "IT",
+                    "reports_to": "CTO",
+                    "communication_style": "technical",
+                    "security_awareness": 0.9,
+                    "susceptibility": {"phishing": 0.1},
+                    "routine": {"morning": "check logs"},
+                    "accounts": {"email": "alice@corp.local"},
+                }
+            ],
+            "npc_traffic": {"http_rate": 20},
+            "task": {
+                "red_briefing": "Hack the network.",
+                "blue_briefing": "Monitor and defend.",
+            },
+            "files": {"web:/var/www/index.php": "<?php echo 'hi'; ?>"},
+        }
+        spec = _parse_llm_response(json.dumps(data))
+        # Verify all sections
+        assert spec.topology["hosts"] == ["attacker", "web", "db", "siem"]
+        assert len(spec.truth_graph.vulns) == 1
+        assert spec.truth_graph.exploit_chain[0].vuln_id == "V1"
+        assert spec.truth_graph.exploit_chain[0].command == "sqlmap"
+        assert len(spec.golden_path) == 2
+        assert spec.golden_path[0].command == "nmap -sV 10.0.1.0/24"
+        assert spec.golden_path[1].expect_in_stdout == "results"
+        assert spec.flags[0].value == "FLAG{complex}"
+        assert len(spec.evidence_spec) == 2  # 1 string + 1 list item
+        assert len(spec.npc_personas) == 1
+        assert spec.npc_traffic.rate_lambda == 20
+        assert spec.task.red_briefing == "Hack the network."
+        # files: explicit + vulnerable_code dict
+        assert "web:/var/www/index.php" in spec.files
+        assert "web:search.php" in spec.files  # from vulnerable_code dict

tests/test_renderer_integration.py ADDED Viewed

	@@ -0,0 +1,373 @@

+"""Integration tests for the full renderer pipeline.
+Loads real LLM output from snapshots/llm_tier1_test.json, parses it
+through _parse_llm_response(), renders through SnapshotRenderer.render(),
+and verifies all output files contain expected content.
+"""
+from __future__ import annotations
+import json
+import tempfile
+from pathlib import Path
+import pytest
+from open_range.builder.builder import _parse_llm_response
+from open_range.builder.renderer import SnapshotRenderer
+ROOT = Path(__file__).parent.parent
+SNAPSHOT_PATH = ROOT / "snapshots" / "llm_tier1_test.json"
+@pytest.fixture
+def llm_output() -> dict:
+    """Load the real LLM output JSON."""
+    return json.loads(SNAPSHOT_PATH.read_text())
+@pytest.fixture
+def parsed_spec(llm_output):
+    """Parse real LLM output through _parse_llm_response."""
+    return _parse_llm_response(json.dumps(llm_output))
+@pytest.fixture
+def rendered_dir(parsed_spec):
+    """Render the parsed spec and yield the output directory."""
+    renderer = SnapshotRenderer()
+    with tempfile.TemporaryDirectory() as tmpdir:
+        out = Path(tmpdir) / "integration_out"
+        renderer.render(parsed_spec, out)
+        yield out
+# ---------------------------------------------------------------------------
+# Pipeline: parse -> render round-trip
+# ---------------------------------------------------------------------------
+class TestParseLLMOutput:
+    """Verify _parse_llm_response correctly handles real LLM output."""
+    def test_parse_produces_snapshot_spec(self, parsed_spec):
+        from open_range.protocols import SnapshotSpec
+        assert isinstance(parsed_spec, SnapshotSpec)
+    def test_parse_has_topology(self, parsed_spec):
+        assert "hosts" in parsed_spec.topology
+        assert len(parsed_spec.topology["hosts"]) == 8
+    def test_parse_has_vulns(self, parsed_spec):
+        assert len(parsed_spec.truth_graph.vulns) >= 1
+        vuln_types = {v.type for v in parsed_spec.truth_graph.vulns}
+        assert "sqli" in vuln_types
+    def test_parse_has_flags(self, parsed_spec):
+        assert len(parsed_spec.flags) >= 2
+    def test_parse_has_golden_path(self, parsed_spec):
+        assert len(parsed_spec.golden_path) >= 1
+        # Golden path steps should have commands
+        for step in parsed_spec.golden_path:
+            assert step.command, f"Step {step.step} has empty command"
+    def test_parse_has_task_briefings(self, parsed_spec):
+        assert parsed_spec.task.red_briefing
+        assert parsed_spec.task.blue_briefing
+    def test_parse_has_files(self, parsed_spec):
+        assert len(parsed_spec.files) > 0
+        # Should include web files and db:sql
+        web_files = [k for k in parsed_spec.files if k.startswith("web:")]
+        assert len(web_files) > 0
+    def test_parse_has_npc_personas(self, parsed_spec):
+        assert len(parsed_spec.npc_personas) >= 1
+    def test_golden_path_uses_command_field(self, parsed_spec):
+        """LLM output uses 'cmd', parser should map to 'command'."""
+        for step in parsed_spec.golden_path:
+            assert step.command  # Should be populated from 'cmd' key
+    def test_golden_path_uses_expect_in_stdout(self, parsed_spec):
+        """LLM output uses 'expect_stdout', parser maps to 'expect_in_stdout'."""
+        for step in parsed_spec.golden_path:
+            assert step.expect_in_stdout
+# ---------------------------------------------------------------------------
+# All output files exist
+# ---------------------------------------------------------------------------
+class TestRenderedFilesExist:
+    """Verify all 6 template outputs are created."""
+    EXPECTED_FILES = [
+        "docker-compose.yml",
+        "Dockerfile.web",
+        "Dockerfile.db",
+        "nginx.conf",
+        "init.sql",
+        "iptables.rules",
+    ]
+    def test_all_output_files_exist(self, rendered_dir):
+        for fname in self.EXPECTED_FILES:
+            path = rendered_dir / fname
+            assert path.exists(), f"Missing output file: {fname}"
+    def test_all_output_files_non_empty(self, rendered_dir):
+        for fname in self.EXPECTED_FILES:
+            content = (rendered_dir / fname).read_text()
+            assert len(content) > 0, f"Empty output file: {fname}"
+# ---------------------------------------------------------------------------
+# nginx.conf content verification
+# ---------------------------------------------------------------------------
+class TestNginxConf:
+    """Verify rendered nginx.conf has correct content."""
+    def test_references_php_fpm_socket(self, rendered_dir):
+        nginx = (rendered_dir / "nginx.conf").read_text()
+        assert "php8.1-fpm.sock" in nginx
+    def test_has_server_block(self, rendered_dir):
+        nginx = (rendered_dir / "nginx.conf").read_text()
+        assert "server {" in nginx
+        assert "listen 80" in nginx
+    def test_has_php_location(self, rendered_dir):
+        nginx = (rendered_dir / "nginx.conf").read_text()
+        assert "location ~ \\.php$" in nginx
+    def test_has_fastcgi_pass(self, rendered_dir):
+        nginx = (rendered_dir / "nginx.conf").read_text()
+        assert "fastcgi_pass unix:/run/php/php8.1-fpm.sock" in nginx
+# ---------------------------------------------------------------------------
+# docker-compose.yml content verification
+# ---------------------------------------------------------------------------
+class TestDockerCompose:
+    """Verify rendered docker-compose.yml has correct static IPs and structure."""
+    def test_has_services_section(self, rendered_dir):
+        compose = (rendered_dir / "docker-compose.yml").read_text()
+        assert "services:" in compose
+    def test_has_all_core_services(self, rendered_dir):
+        compose = (rendered_dir / "docker-compose.yml").read_text()
+        for service in ["attacker:", "firewall:", "web:", "mail:", "db:", "siem:", "ldap:", "files:"]:
+            assert service in compose, f"Missing service: {service}"
+    def test_has_network_definitions(self, rendered_dir):
+        compose = (rendered_dir / "docker-compose.yml").read_text()
+        assert "networks:" in compose
+        assert "external:" in compose
+        assert "dmz:" in compose
+        assert "internal:" in compose
+        assert "management:" in compose
+    def test_has_static_ips(self, rendered_dir):
+        compose = (rendered_dir / "docker-compose.yml").read_text()
+        # Key static IPs from the template
+        assert "10.0.0.10" in compose  # attacker
+        assert "10.0.0.2" in compose   # firewall external
+        assert "10.0.1.10" in compose  # web dmz
+        assert "10.0.2.20" in compose  # db internal
+        assert "10.0.3.20" in compose  # ldap management
+        assert "10.0.3.21" in compose  # siem management
+    def test_web_depends_on_db(self, rendered_dir):
+        compose = (rendered_dir / "docker-compose.yml").read_text()
+        # web service should have depends_on db
+        assert "depends_on:" in compose
+    def test_has_subnet_definitions(self, rendered_dir):
+        compose = (rendered_dir / "docker-compose.yml").read_text()
+        assert "10.0.0.0/24" in compose  # external
+        assert "10.0.1.0/24" in compose  # dmz
+        assert "10.0.2.0/24" in compose  # internal
+        assert "10.0.3.0/24" in compose  # management
+    def test_has_healthchecks(self, rendered_dir):
+        compose = (rendered_dir / "docker-compose.yml").read_text()
+        assert "healthcheck:" in compose
+    def test_attacker_has_net_admin(self, rendered_dir):
+        compose = (rendered_dir / "docker-compose.yml").read_text()
+        assert "NET_ADMIN" in compose
+    def test_db_has_mysql_env_vars(self, rendered_dir):
+        compose = (rendered_dir / "docker-compose.yml").read_text()
+        assert "MYSQL_ROOT_PASSWORD" in compose
+        assert "MYSQL_DATABASE=referral_db" in compose
+        assert "MYSQL_USER=app_user" in compose
+# ---------------------------------------------------------------------------
+# init.sql content verification
+# ---------------------------------------------------------------------------
+class TestInitSQL:
+    """Verify rendered init.sql has referral_db and app_user."""
+    def test_creates_referral_db(self, rendered_dir):
+        sql = (rendered_dir / "init.sql").read_text()
+        assert "referral_db" in sql
+    def test_creates_flags_db(self, rendered_dir):
+        sql = (rendered_dir / "init.sql").read_text()
+        assert "flags" in sql
+    def test_creates_core_tables(self, rendered_dir):
+        sql = (rendered_dir / "init.sql").read_text()
+        assert "CREATE TABLE" in sql
+        assert "users" in sql
+        assert "patients" in sql
+        assert "secrets" in sql
+    def test_creates_healthcare_tables(self, rendered_dir):
+        sql = (rendered_dir / "init.sql").read_text()
+        assert "patient_referrals" in sql
+        assert "billing" in sql
+    def test_grants_app_user(self, rendered_dir):
+        sql = (rendered_dir / "init.sql").read_text()
+        assert "app_user" in sql
+        assert "GRANT" in sql
+    def test_has_flush_privileges(self, rendered_dir):
+        sql = (rendered_dir / "init.sql").read_text()
+        assert "FLUSH PRIVILEGES" in sql
+# ---------------------------------------------------------------------------
+# Dockerfile.web content verification
+# ---------------------------------------------------------------------------
+class TestDockerfileWeb:
+    """Verify rendered Dockerfile.web creates users from topology."""
+    def test_creates_users_from_topology(self, rendered_dir, parsed_spec):
+        dockerfile = (rendered_dir / "Dockerfile.web").read_text()
+        # Should have useradd for users from topology
+        users = parsed_spec.topology.get("users", [])
+        assert len(users) > 0, "Parsed spec should have users"
+        for user in users:
+            username = user.get("username", "")
+            if username:
+                assert "useradd" in dockerfile
+    def test_has_php_fpm(self, rendered_dir):
+        dockerfile = (rendered_dir / "Dockerfile.web").read_text()
+        assert "php8.1-fpm" in dockerfile
+    def test_has_nginx(self, rendered_dir):
+        dockerfile = (rendered_dir / "Dockerfile.web").read_text()
+        assert "nginx" in dockerfile
+    def test_copies_nginx_conf(self, rendered_dir):
+        dockerfile = (rendered_dir / "Dockerfile.web").read_text()
+        assert "COPY nginx.conf" in dockerfile
+    def test_exposes_ports(self, rendered_dir):
+        dockerfile = (rendered_dir / "Dockerfile.web").read_text()
+        assert "EXPOSE" in dockerfile
+        assert "80" in dockerfile
+    def test_plants_file_flags(self, rendered_dir, parsed_spec):
+        """Flags with file paths on web host should appear in Dockerfile."""
+        dockerfile = (rendered_dir / "Dockerfile.web").read_text()
+        for flag in parsed_spec.flags:
+            if flag.host == "web" and "/" in flag.path:
+                assert flag.value in dockerfile, (
+                    f"Flag {flag.id} ({flag.value}) not in Dockerfile.web"
+                )
+    def test_db_flags_not_in_dockerfile(self, rendered_dir, parsed_spec):
+        """Flags with db: paths should NOT appear in Dockerfile.web."""
+        dockerfile = (rendered_dir / "Dockerfile.web").read_text()
+        for flag in parsed_spec.flags:
+            if flag.path.startswith("mysql:") or flag.path.startswith("db:"):
+                assert flag.value not in dockerfile, (
+                    f"DB flag {flag.id} ({flag.value}) should not be in Dockerfile.web"
+                )
+# ---------------------------------------------------------------------------
+# iptables.rules content verification
+# ---------------------------------------------------------------------------
+class TestIptablesRules:
+    """Verify rendered iptables.rules has correct structure."""
+    def test_has_filter_table(self, rendered_dir):
+        rules = (rendered_dir / "iptables.rules").read_text()
+        assert "*filter" in rules
+        assert "COMMIT" in rules
+    def test_has_forward_chain(self, rendered_dir):
+        rules = (rendered_dir / "iptables.rules").read_text()
+        assert "FORWARD" in rules
+# ---------------------------------------------------------------------------
+# Full round-trip: files dict is preserved through parse
+# ---------------------------------------------------------------------------
+class TestFilesPreserved:
+    """Verify that files from LLM output survive the parse pipeline."""
+    def test_files_dict_has_web_files(self, parsed_spec):
+        web_files = {k: v for k, v in parsed_spec.files.items() if k.startswith("web:")}
+        assert len(web_files) > 0
+    def test_files_dict_has_sql(self, parsed_spec):
+        assert "db:sql" in parsed_spec.files
+    def test_index_php_content(self, parsed_spec):
+        key = "web:/var/www/portal/index.php"
+        assert key in parsed_spec.files
+        assert "Meridian Referral Portal" in parsed_spec.files[key]
+    def test_lookup_php_has_sqli(self, parsed_spec):
+        key = "web:/var/www/portal/lookup.php"
+        assert key in parsed_spec.files
+        content = parsed_spec.files[key]
+        # Should contain the vulnerable SQL query
+        assert "last_name LIKE" in content or "$last" in content
+    def test_compliance_report_has_flag(self, parsed_spec):
+        key = "web:/var/www/portal/reports/hipaa_audit.txt"
+        assert key in parsed_spec.files
+        assert "FLAG{1a2b3c4d5e6f7788}" in parsed_spec.files[key]
+    def test_sql_has_user_inserts(self, parsed_spec):
+        sql = parsed_spec.files.get("db:sql", "")
+        assert "dthompson" in sql
+        assert "kwilliams" in sql
+    def test_sql_has_flag_insert(self, parsed_spec):
+        sql = parsed_spec.files.get("db:sql", "")
+        assert "FLAG{9f3a2b4c5d6e7f80}" in sql
+    def test_files_samba_shares(self, parsed_spec):
+        files_entries = {k: v for k, v in parsed_spec.files.items() if k.startswith("files:")}
+        assert len(files_entries) > 0
+    def test_db_backup_script(self, parsed_spec):
+        key = "db:/opt/scripts/db_backup.sh"
+        assert key in parsed_spec.files
+        assert "mysqldump" in parsed_spec.files[key]

uv.lock CHANGED Viewed

@@ -1862,52 +1862,6 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/a2/eb/86626c1bbc2edb86323022371c39aa48df6fd8b0a1647bc274577f72e90b/nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:5b17e2001cc0d751a5bc2c6ec6d26ad95913324a4adb86788c944f8ce9ba441f", size = 89954, upload-time = "2025-03-07T01:42:44.131Z" },
 ]
-[[package]]
-name = "open-range"
-version = "0.1.0"
-source = { editable = "." }
-dependencies = [
-    { name = "docker" },
-    { name = "fastapi" },
-    { name = "jinja2" },
-    { name = "openenv-core", extra = ["core"] },
-    { name = "pydantic" },
-    { name = "pyyaml" },
-    { name = "uvicorn" },
-]
-[package.optional-dependencies]
-builder = [
-    { name = "litellm" },
-]
-dev = [
-    { name = "httpx" },
-    { name = "pytest" },
-    { name = "pytest-asyncio" },
-]
-training = [
-    { name = "trl" },
-    { name = "unsloth" },
-]
-[package.metadata]
-requires-dist = [
-    { name = "docker", specifier = ">=7.0" },
-    { name = "fastapi", specifier = ">=0.115" },
-    { name = "httpx", marker = "extra == 'dev'", specifier = ">=0.27" },
-    { name = "jinja2", specifier = ">=3.1" },
-    { name = "litellm", marker = "extra == 'builder'", specifier = ">=1.30" },
-    { name = "openenv-core", extras = ["core"], specifier = ">=0.2.1" },
-    { name = "pydantic", specifier = ">=2.0" },
-    { name = "pytest", marker = "extra == 'dev'", specifier = ">=8.0" },
-    { name = "pytest-asyncio", marker = "extra == 'dev'", specifier = ">=0.23" },
-    { name = "pyyaml", specifier = ">=6.0" },
-    { name = "trl", marker = "extra == 'training'", specifier = ">=0.8" },
-    { name = "unsloth", marker = "extra == 'training'" },
-    { name = "uvicorn", specifier = ">=0.27" },
-]
-provides-extras = ["dev", "training", "builder"]
 [[package]]
 name = "openai"
 version = "2.26.0"
@@ -1972,6 +1926,54 @@ core = [
     { name = "websockets" },
 ]
 [[package]]
 name = "opentelemetry-api"
 version = "1.40.0"

     { url = "https://files.pythonhosted.org/packages/a2/eb/86626c1bbc2edb86323022371c39aa48df6fd8b0a1647bc274577f72e90b/nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:5b17e2001cc0d751a5bc2c6ec6d26ad95913324a4adb86788c944f8ce9ba441f", size = 89954, upload-time = "2025-03-07T01:42:44.131Z" },
 ]
 [[package]]
 name = "openai"
 version = "2.26.0"
     { name = "websockets" },
 ]
+[[package]]
+name = "openenv-open-range"
+version = "0.1.0"
+source = { editable = "." }
+dependencies = [
+    { name = "click" },
+    { name = "docker" },
+    { name = "fastapi" },
+    { name = "jinja2" },
+    { name = "openenv-core", extra = ["core"] },
+    { name = "pydantic" },
+    { name = "pyyaml" },
+    { name = "uvicorn" },
+]
+[package.optional-dependencies]
+builder = [
+    { name = "litellm" },
+]
+dev = [
+    { name = "httpx" },
+    { name = "pytest" },
+    { name = "pytest-asyncio" },
+]
+training = [
+    { name = "trl" },
+    { name = "unsloth" },
+]
+[package.metadata]
+requires-dist = [
+    { name = "click", specifier = ">=8.1" },
+    { name = "docker", specifier = ">=7.0" },
+    { name = "fastapi", specifier = ">=0.115.0" },
+    { name = "httpx", marker = "extra == 'dev'", specifier = ">=0.27" },
+    { name = "jinja2", specifier = ">=3.1" },
+    { name = "litellm", marker = "extra == 'builder'", specifier = ">=1.30" },
+    { name = "openenv-core", extras = ["core"], specifier = ">=0.2.1" },
+    { name = "pydantic", specifier = ">=2.0.0" },
+    { name = "pytest", marker = "extra == 'dev'", specifier = ">=8.0" },
+    { name = "pytest-asyncio", marker = "extra == 'dev'", specifier = ">=0.23" },
+    { name = "pyyaml", specifier = ">=6.0" },
+    { name = "trl", marker = "extra == 'training'", specifier = ">=0.8" },
+    { name = "unsloth", marker = "extra == 'training'" },
+    { name = "uvicorn", specifier = ">=0.24.0" },
+]
+provides-extras = ["dev", "training", "builder"]
 [[package]]
 name = "opentelemetry-api"
 version = "1.40.0"