Aaron Brown commited on
Commit
3ea4118
·
1 Parent(s): f549fda

Cleanup: fix bugs, remove dead code, add missing packages

Browse files

- Add openenv fallback stubs in client.py (matches models.py pattern)
- Fix auth command parsing with maxsplit=3 (passwords with spaces)
- Fix reward exception silencing: log at ERROR with traceback
- Add snapshot validation: default flags/topology/task if None
- Fix shell injection in file deploy with shlex.quote()
- Remove async anti-pattern in rollout.py
- Remove duplicate src/open_range/server/Dockerfile
- Remove unused requests dependency
- Remove redundant uv install check from Dockerfile
- Add missing packages: open_range.agents, open_range.validator

AGENTS.md ADDED
@@ -0,0 +1,722 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # AGENTS.md
2
+
3
+ Guidance for Codex when working on OpenRange.
4
+
5
+ ## What Is OpenRange
6
+
7
+ OpenRange is a **multi-agent cybersecurity gymnasium** built on OpenEnv 0.2.1. It is the first cybersecurity environment in the OpenEnv ecosystem.
8
+
9
+ Three LLM roles operate on real Docker infrastructure:
10
+
11
+ | Role | Entry Point | What It Does |
12
+ |------|-------------|--------------|
13
+ | **Builder** (`pi_build`) | YAML manifest | Generates Dockerfiles, docker-compose, configs with planted vulns. Runs NPC traffic. Evolves range via curriculum. |
14
+ | **Red** (`pi_red`) | External (no access) | Attacks live containers. Rewards: flag capture, efficiency, stealth, evidence quality, anti-hallucination. |
15
+ | **Blue** (`pi_blue`) | Internal (monitor host) | Defends via log analysis, patching, firewalling. Rewards: detection rate, patch validity, availability, FP penalty. |
16
+
17
+ Red and Blue train **in tandem** — both agents active on the same range simultaneously.
18
+ Red's stealth reward is coupled to Blue's detection, creating adversarial co-evolution.
19
+
20
+ A **golden path** (the answer key) validates every generated range before training begins.
21
+ The golden path is generated by the Builder LLM and reviewed by the Validator LLM.
22
+
23
+ ## Architecture (5 Layers)
24
+
25
+ ```
26
+ Layer 1: YAML Manifest (human-authored topology, vulns, golden path, escalation rules)
27
+ |
28
+ Layer 2: Builder Agent (YAML -> Dockerfiles, compose, configs, NPC scripts -> docker compose up)
29
+ |
30
+ Layer 3: Validator (10-check admission pipeline: 8 mechanical + 2 LLM advisory)
31
+ |
32
+ Layer 4: OpenEnv Server (FastAPI on HF Spaces: /reset, /step, /state) + Red/Blue Operators
33
+ |
34
+ Layer 5: Training (TRL GRPOTrainer + Unsloth QLoRA) + Curriculum (escalate -> mutated YAML' -> back to Layer 1)
35
+ ```
36
+
37
+ ## Reset = Mutation (Critical Design)
38
+
39
+ **`reset()` does NOT restart the same environment.** It selects a different pre-validated
40
+ snapshot with different vulnerabilities. Example: a web app had XSS on episode N; after reset,
41
+ episode N+1 uses a snapshot with IDOR instead. The topology stays the same but the planted
42
+ vulnerabilities, flags, and golden path change.
43
+
44
+ This means the agent **cannot memorize** a fixed exploit chain. It must learn to **generalize**
45
+ across vulnerability classes.
46
+
47
+ ### Snapshot Generation (Async, Between Episodes)
48
+
49
+ ```
50
+ Builder LLM called asynchronously (background queue, NOT in reset() hot path)
51
+ |
52
+ v
53
+ Builder LLM generates new snapshot as STRUCTURED JSON (not prose — SWE-RL lesson):
54
+ - Same SnapshotBuilder protocol, different BuildContext (episode history, solve rates, weak areas)
55
+ - Outputs formal spec: {topology, truth_graph, vulns, golden_path, evidence_spec,
56
+ npc_personas, task briefings}
57
+ - Thin template layer renders JSON spec → actual config files (PHP, nginx.conf, etc.)
58
+ - This separates LLM reasoning (creative) from file formatting (mechanical)
59
+ |
60
+ v
61
+ Partial container restart (hot-swap modified files, restart affected services)
62
+ |
63
+ v
64
+ 10-Check Validator Admission Pipeline (per R2E-Gym + SWE-RL lessons):
65
+ Mechanical checks (deterministic, no LLM):
66
+ 1. Build + boot: docker compose up + healthchecks (all containers, all ports)
67
+ 2. Exploitability: golden path end-to-end (each step produces expect_stdout)
68
+ 3. Patchability: inverse mutation test — revert each vuln, its golden path step MUST fail
69
+ 4. Evidence sufficiency: logs + SIEM alerts exist for Blue investigation
70
+ 5. Reward grounding: rubrics produce valid scores against known scenarios
71
+ 6. Isolation + leakage: zones enforced, no flag values in briefings
72
+ 7. Task feasibility: tasks reference real reachable hosts, services, logs
73
+ 8. Difficulty calibration: golden path steps within ±20% of tier target
74
+ LLM checks (configurable, removable):
75
+ 9. NPC consistency: personas respond per security_awareness (LLM tests NPCs)
76
+ 10. Realism review: scenario plausibility + briefing leakage (LLM advisory only)
77
+ |
78
+ v
79
+ PASS -> store in Snapshot Store (frozen, immutable, ready for reset())
80
+ FAIL -> Builder LLM receives error context, retries (max 3)
81
+ ```
82
+
83
+ ### Reset Flow (Fast — Draws From Pool)
84
+
85
+ ```
86
+ reset() called by training orchestration
87
+ |
88
+ v
89
+ Select pre-validated snapshot from Snapshot Store
90
+ (strategy: latest, random, or curriculum_weighted)
91
+ |
92
+ v
93
+ Boot or restore snapshot containers from frozen Docker artifacts
94
+ |
95
+ v
96
+ Return initial RangeObservation with challenge briefing
97
+ (Red briefing: tiered by difficulty. Blue briefing: always minimal.)
98
+ ```
99
+
100
+ ### Why LLM-Based (Not Templates)
101
+
102
+ Templates produce predictable, shallow vulnerabilities. An LLM Builder can:
103
+ - **Compose novel vuln chains**: SSRF to access internal DB, then SQLi on internal endpoint
104
+ - **Vary attack surfaces creatively**: Different URL structures, parameter names, auth flows each episode
105
+ - **Generate realistic code**: Vulnerable PHP/Python/Node apps that look like real software, not CTF toy examples
106
+ - **Adapt to agent behavior**: If Red consistently solves SQLi easily, Builder can plant harder variants or combine with WAF rules
107
+
108
+ The Validator LLM closes the loop: it reviews the Builder's output to ensure the challenge is
109
+ actually solvable, properly difficult, and that the description doesn't leak the answer.
110
+
111
+ ### Verifiable Rewards
112
+
113
+ Despite LLM-based generation, **all rewards are grounded in verifiable container state**:
114
+ - Flag: `docker exec cat <path>` — binary match, not LLM-judged
115
+ - Patch validity: Re-run golden path exploit command — if it fails, patch worked
116
+ - Service availability: HTTP healthcheck — binary up/down
117
+ - Hallucination: Compare submitted flag against manifest-defined flags — exact string match
118
+
119
+ The LLM generates the challenge. The rewards are verified by code execution against real infrastructure.
120
+
121
+ ### Challenge Diversity (Black-Box Agents)
122
+
123
+ Agents operate **black-box** — they see briefings and environment outputs, never the truth graph.
124
+
125
+ **Red briefing** is tiered by difficulty:
126
+ - Tier 1: topology + vague hint ("web application with database backend, find vulnerabilities")
127
+ - Tier 2: topology only ("corporate network, 10 hosts, find and exploit")
128
+ - Tier 3+: minimal ("enterprise network, go") — forces pure recon
129
+
130
+ **Blue briefing** is always minimal: "monitor SIEM for suspicious activity, investigate, respond."
131
+ Blue never knows what vulnerability class was planted.
132
+
133
+ **Episode diversity** prevents memorization:
134
+ - Must NOT repeat same vuln class within last 3 episodes
135
+ - Must NOT reuse same injection point within last 5 episodes
136
+ - Vary approach even within same vuln class (SQLi in search vs login vs API)
137
+ - Language/framing of briefings varies each episode
138
+
139
+ **Progression** builds naturally via curriculum:
140
+ - Early: single-vuln, direct exploit (SQLi → flag)
141
+ - Mid: multi-vuln chains (IDOR → cred leak → DB access)
142
+ - Late: multi-host pivots (web → internal → management → flag)
143
+ - Driven by solve rates, not hardcoded episode numbers
144
+
145
+ ### Red + Blue Tandem RL (Core Design)
146
+
147
+ **Both offensive and defensive agents train in tandem, not sequentially.**
148
+
149
+ ```
150
+ Episode N:
151
+ Builder LLM generates mutated range (new vulns, new golden path)
152
+ Validator LLM + scripted checks confirm range is valid
153
+ |
154
+ Red acts: nmap -> discover services -> exploit vuln -> capture flag
155
+ | (Red's actions appear in container logs in real time)
156
+ |
157
+ Blue observes: log stream = NPC noise + Red's real attack actions
158
+ Blue acts: analyze logs -> identify attack -> patch/block -> submit findings
159
+ |
160
+ Rewards computed:
161
+ Red: flag + efficiency + stealth(did Blue detect?) + anti-hallucination
162
+ Blue: detection(did Blue catch Red?) + patch(did patch block exploit?) + availability + FP penalty
163
+ |
164
+ Both rewards feed back to their respective GRPO trainers
165
+ ```
166
+
167
+ **Key coupling**: Red's stealth reward depends on Blue's detection. Blue's detection reward
168
+ depends on Red's actions. This creates an adversarial co-evolution:
169
+ - Red learns to be stealthier -> Blue must learn better detection
170
+ - Blue learns to detect faster -> Red must learn new evasion techniques
171
+
172
+ This is NOT self-play (single model playing both roles). It's **two separate policies** trained
173
+ against shared infrastructure with coupled reward signals.
174
+
175
+ ### Vulnerability Classes (Examples)
176
+
177
+ | OWASP | Class | Example | Scope |
178
+ |-------|-------|---------|-------|
179
+ | A01 | IDOR | Sequential user IDs without authz | web API |
180
+ | A01 | Path Traversal | `file=` param without sanitization | web |
181
+ | A01 | LFI | `include($_GET['page'])` → server files | web |
182
+ | A01 | RFI | Remote file include → code execution | web (Tier 2+) |
183
+ | A01 | Missing Authz | Unprotected admin endpoint | web |
184
+ | A03 | SQLi | Unsanitized query parameter | web → db |
185
+ | A03 | XSS | Comment form → admin session hijack | web |
186
+ | A03 | Command Injection | User input to `os.system()` | web → shell |
187
+ | A03 | LDAP Injection | Unsanitized LDAP bind/search | web → ldap |
188
+ | A03 | SSTI | Template injection → RCE | web |
189
+ | A03 | XXE | XML external entity → file read / SSRF | web |
190
+ | A04 | File Upload | Unrestricted upload → webshell | web |
191
+ | A05 | Service Misconfig | Debug endpoints, default configs | any host |
192
+ | A07 | Weak Creds | Default passwords | SSH, DB, LDAP, SMB |
193
+ | A07 | Broken Auth | JWT `alg:none`, session fixation | web |
194
+ | A07 | Credential Reuse | Same password → lateral movement | cross-service |
195
+ | A07 | Kerberoasting | Kerberos ticket attacks | ldap (Tier 3+) |
196
+ | A08 | RCE | `eval()`, pickle, code injection | web → shell |
197
+ | A08 | Deserialization | Insecure deserialization | web |
198
+ | A10 | SSRF | URL fetch hitting internal services | web → internal |
199
+ | Infra | SMB Misconfig | Guest access, null sessions | files |
200
+ | Infra | Mail Misconfig | Open relay, missing SPF/DKIM | mail |
201
+ | Infra | Firewall Bypass | Zone traversal, rule gaps | firewall |
202
+ | Infra | SSH Key Exposure | Private keys readable | any host (Tier 2+) |
203
+ | Ops | Config Drift | Stale config diverged from intended | any host |
204
+ | Ops | Orphaned Access | Departed staff accounts | ldap |
205
+ | Ops | Data Exposure | Creds in backups, logs, configs | any host |
206
+ | T3+ | CI/CD Poisoning | Pipeline injection | ci_cd (Tier 3+) |
207
+ | T3+ | Supply Chain | Dependency confusion | ci_cd (Tier 3+) |
208
+ | Chain | Multi-host | SSRF → internal SQLi → flag in DB | cross-zone |
209
+ | Chain | Lateral | Credential reuse → SSH pivot → LDAP dump | cross-service |
210
+
211
+ ### Implications for Training
212
+
213
+ - **Reset latency**: LLM generation (~10-20s) + container update (~10-20s) + LLM validation (~10-15s) + scripted validation (~5-10s) = ~35-65s per reset
214
+ - **GRPO batching**: All `num_generations` in a batch share the SAME mutated range (reset once per batch, not per generation)
215
+ - **Episode diversity**: LLM generates genuinely novel challenges each reset — not cycling through fixed templates
216
+ - **Container cleanup**: After each episode, dirty state cleaned by restarting affected service containers
217
+ - **Tandem training**: Red and Blue GRPO trainers can run on same or different GPUs, sharing environment
218
+ - **Curriculum**: As both agents improve, Builder LLM generates harder challenges (more hosts, chained vulns, stealthier golden paths)
219
+
220
+ ## Lessons from Research (R2E-Gym, Self-Play SWE-RL)
221
+
222
+ These papers directly inform OpenRange's design. Violating these lessons risks repeating known failures.
223
+
224
+ ### From R2E-Gym (Procedural Environments + Hybrid Verifiers)
225
+
226
+ 1. **Hybrid verification is non-negotiable.** Execution-based verification alone plateaus at ~43%. LLM-based verification alone plateaus at ~43%. Combined: 51%. OpenRange's Validator MUST use both LLM review AND scripted golden-path execution.
227
+
228
+ 2. **Synthetic task generation equals human quality.** LLM-generated task descriptions perform identically to human-written ones (27.8% vs 28.0%). Builder LLM generating cyber challenges from vulnerability catalogs is a validated approach.
229
+
230
+ 3. **Toxic tests are real.** Up to 10% of generated validations incorrectly favor wrong solutions. Track Validator false-positive rate (accepting broken ranges) and false-negative rate (rejecting valid ranges).
231
+
232
+ 4. **Include reasoning traces in training data.** SFT with agent thought processes improves downstream performance by +3.8%. Red and Blue training trajectories MUST include structured reasoning (recon plan → vuln hypothesis → exploit attempt → verification), not just raw commands.
233
+
234
+ 5. **Build environment creation is the hardest part.** Docker dependency resolution, service connectivity, and reproducibility dominate engineering effort. Pre-build base images extensively.
235
+
236
+ ### From Self-Play SWE-RL (Adversarial Self-Improvement)
237
+
238
+ 6. **Formal specifications beat natural language.** Their biggest failed experiment: generating NL issue descriptions. A 32B model produced incoherent, repetitive text. They succeeded with formal test specs. **Builder LLM should output structured JSON specs** (vuln_type, injection_point, golden_path_commands, flag_location), NOT prose. The challenge description for the AGENT can be NL, but the Builder's internal output must be formal.
239
+
240
+ 7. **Builder reward: `r_inject = 1 - (1+α)·s`** where s = solve rate, α = 0.8. Penalizes too-easy challenges (s→1) and too-hard/impossible ones (s→0). Rewards challenges at the frontier of the agent's current ability. This naturally creates curriculum without manual difficulty design.
241
+
242
+ 8. **7-check consistency validation with inverse mutation testing.** Every generated range must pass:
243
+ - Services exist and respond
244
+ - Flags are accessible at expected locations
245
+ - Vulnerability is actually exploitable (golden path succeeds)
246
+ - Network isolation holds
247
+ - Difficulty matches target
248
+ - Challenge description doesn't leak the answer
249
+ - **Inverse mutation test**: for each planted vuln, removing ONLY that vuln must cause the golden path to fail at the corresponding step. This verifies each vuln actually contributes to the challenge.
250
+
251
+ 9. **Higher-order challenges from failed attempts.** When Blue fails to patch a vuln, the resulting state (partial patch + remaining vuln) becomes a harder challenge for the next episode. When Red fails to exploit, the failed attempt reveals what didn't work, informing the Builder to create challenges that specifically test that weakness.
252
+
253
+ 10. **Collapse risks in adversarial training.** A sufficiently capable Red agent can learn dominant strategies (e.g., obfuscation, always-same-attack) that stall Blue learning. Mitigations: ground in real-world data (real CVE patterns), limit divergence from realistic attack patterns, don't let Red game the reward through unrealistic strategies.
254
+
255
+ 11. **SFT before RL is critical.** Both papers use SFT on expert trajectories first, then RL. Never start GRPO from a cold model — always warm-start with supervised fine-tuning on successful attack/defense traces.
256
+
257
+ 12. **Binary reward for solver, nuanced reward for generator.** Red/Blue can use binary rewards (flag found or not, attack detected or not). The Builder needs the frontier-calibrating `r_inject` reward to learn optimal difficulty.
258
+
259
+ ## Key Invariants
260
+
261
+ - **Golden path gates training**: No episode runs on unvalidated infrastructure. Validator must PASS all 10 admission checks (8 mechanical + 2 LLM).
262
+ - **Rewards are grounded**: Every reward signal verified against golden-path-validated container state (flags via `docker exec`, patches via re-running exploit chain).
263
+ - **Anti-hallucination**: Flag submissions checked against manifest-defined flags. Fake flags penalized at -0.3.
264
+ - **Agents cannot reset**: Only training orchestration controls episode lifecycle (inherited from OpenEnv).
265
+ - **Horizontal growth, not vertical**: Difficulty increases by adding hosts/networks/services, not just harder passwords.
266
+ - **NPC noise is mandatory for Blue**: Without background traffic, detection is trivial and stealth is meaningless. NPCs evolve from shell-script noise (Level 0) to LLM-driven personas with susceptibility profiles (Level 1+), creating a social engineering attack surface.
267
+ - **Client-server separation**: Follows OpenEnv pattern — clients never import from `server/`.
268
+
269
+ ## Directory Structure
270
+
271
+ ```
272
+ open-range/
273
+ ├── AGENTS.md # This file
274
+ ├── IMPLEMENTATION_PLAN.md # Build plan, testing, open questions
275
+ ├── manifests/ # YAML range manifests (human-authored)
276
+ │ ├── schema.yaml # JSON Schema for manifest validation
277
+ │ ├── tier1_basic.yaml # 8-host enterprise, ~8 golden path steps
278
+ │ ├── tier2_corporate.yaml # 10-12 host, ~15 golden path steps
279
+ │ └── tier3_enterprise.yaml # 14-18 host, ~25 golden path steps
280
+ ├── protocols.py # Agent protocols (SnapshotBuilder, NPCBehavior, ValidatorCheck)
281
+ ├── resolve.py # Dynamic component resolution (importlib + Protocol check)
282
+ ├── builder/ # Builder agent (Layer 2)
283
+ │ ├── builder.py # LLMSnapshotBuilder + TemplateOnlyBuilder + FileBuilder
284
+ │ ├── mutator.py # Vuln mutation logic (swap vulns between resets)
285
+ │ ├── templates/ # Jinja2 templates for Dockerfiles, configs
286
+ │ └── npc/ # NPC system (Level 0: shell scripts, Level 1: LLM personas)
287
+ │ ├── npc_manager.py # Orchestrator: starts scripts + LLM agents per snapshot
288
+ │ ├── persona.py # Pydantic NPC persona model (security_awareness, susceptibility)
289
+ │ ├── npc_agent.py # Async LLM NPC agent loop (email check, decide, act)
290
+ │ ├── http_traffic.sh # Level 0: curl loops
291
+ │ ├── smtp_traffic.sh # Level 0: email noise
292
+ │ └── *.sh # Level 0: other service traffic scripts
293
+ ├── validator/ # Golden path validator (Layer 3) — 10-check admission pipeline
294
+ │ ├── validator.py # Validator pipeline (runs list of ValidatorCheck protocols)
295
+ │ ├── build_boot.py # Check 1: docker compose up + healthchecks (mechanical)
296
+ │ ├── exploitability.py # Check 2: golden path end-to-end (mechanical)
297
+ │ ├── patchability.py # Check 3: inverse mutation test (mechanical)
298
+ │ ├── evidence.py # Check 4: logs + alerts exist (mechanical)
299
+ │ ├── reward_grounding.py # Check 5: rubrics produce valid scores (mechanical)
300
+ │ ├── isolation.py # Check 6: zones enforced, no leaks (mechanical)
301
+ │ ├── task_feasibility.py # Check 7: tasks reference real reachable hosts/services/logs (mechanical)
302
+ │ ├── difficulty.py # Check 8: golden path steps within ±20% of tier target (mechanical)
303
+ │ ├── npc_consistency.py # Check 9: NPC personas respond per security_awareness (LLM)
304
+ │ └── realism_review.py # Check 10: scenario plausibility + briefing leakage (LLM, advisory)
305
+ ├── server/ # OpenEnv server (Layer 4)
306
+ │ ├── app.py # FastAPI application (create_app)
307
+ │ ├── environment.py # CyberRange Environment subclass
308
+ │ ├── models.py # RangeAction, RangeObservation, RangeState
309
+ │ ├── rewards.py # Reward components (flag, stealth, detect, etc.)
310
+ │ ├── Dockerfile # Container for HF Spaces deployment
311
+ │ └── requirements.txt
312
+ ├── client/ # OpenEnv client (typed)
313
+ │ ├── __init__.py
314
+ │ └── client.py # OpenRangeEnv(EnvClient) or MCPToolClient
315
+ ├── training/ # Training scripts (DEFERRED — environment-first)
316
+ │ ├── rollout.py # rollout_func for GRPOTrainer (OpenEnv integration point)
317
+ │ └── curriculum.py # Phi: escalation logic, YAML mutation
318
+ ├── scripts/ # Utility scripts
319
+ │ ├── deploy_hf.sh # Deploy to HF Spaces
320
+ │ └── run_local.sh # Local development runner
321
+ ├── tests/ # Test suite
322
+ │ ├── test_manifest.py # Schema validation tests
323
+ │ ├── test_validator.py # Golden path validation tests
324
+ │ ├── test_environment.py # OpenEnv server tests
325
+ │ ├── test_rewards.py # Reward component tests
326
+ │ └── test_integration.py # End-to-end integration tests
327
+ ├── pyproject.toml
328
+ └── README.md
329
+ ```
330
+
331
+ ## OpenEnv Compatibility (EXACT API Contract)
332
+
333
+ OpenRange follows the OpenEnv 0.2.x environment pattern. Reference implementations:
334
+ `envs/coding_env/` (command execution) and `envs/echo_env/` (MCP tools).
335
+
336
+ ### Base Classes (from `openenv.core.env_server.types`)
337
+
338
+ ```python
339
+ # Action base: extra="forbid" (rejects unknown fields)
340
+ class Action(BaseModel):
341
+ metadata: Dict[str, Any] = {}
342
+
343
+ # Observation base: extra="forbid", already has done + reward
344
+ class Observation(BaseModel):
345
+ done: bool = False
346
+ reward: bool | int | float | None = None
347
+ metadata: Dict[str, Any] = {}
348
+
349
+ # State base: extra="allow" (allows additional fields)
350
+ class State(BaseModel):
351
+ episode_id: Optional[str] = None
352
+ step_count: int = 0
353
+ ```
354
+
355
+ ### OpenRange Models (`server/models.py`)
356
+
357
+ ```python
358
+ from openenv.core.env_server.types import Action, Observation, State
359
+
360
+ class RangeAction(Action):
361
+ command: str # Shell command or tool invocation
362
+ mode: Literal["red", "blue"] # Which operator is acting
363
+
364
+ class RangeObservation(Observation):
365
+ # NOTE: done and reward are INHERITED from Observation base — do NOT redeclare
366
+ stdout: str = "" # Command output
367
+ stderr: str = "" # Error output
368
+ flags_captured: list[str] = []
369
+ alerts: list[str] = [] # Blue: IDS/log alerts
370
+
371
+ class RangeState(State):
372
+ # NOTE: episode_id and step_count are INHERITED from State base
373
+ mode: str = "" # Current active mode (red/blue)
374
+ flags_found: list[str] = []
375
+ services_status: dict = {}
376
+ tier: int = 1
377
+ ```
378
+
379
+ ### Environment (`server/environment.py`)
380
+
381
+ ```python
382
+ from openenv.core.env_server.interfaces import Environment
383
+
384
+ class RangeEnvironment(Environment[RangeAction, RangeObservation, RangeState]):
385
+ SUPPORTS_CONCURRENT_SESSIONS = False # One episode per range instance
386
+
387
+ def __init__(self):
388
+ super().__init__() # Can pass transform= and rubric= here
389
+ self._state = RangeState()
390
+
391
+ def reset(self, seed: Optional[int] = None,
392
+ episode_id: Optional[str] = None, **kwargs) -> RangeObservation:
393
+ # Trigger Builder LLM mutation + Validator
394
+ # Clear episode state
395
+ self._state = RangeState(episode_id=episode_id or str(uuid4()))
396
+ return RangeObservation(stdout="Range ready. Begin reconnaissance.")
397
+
398
+ def step(self, action: RangeAction,
399
+ timeout_s: Optional[float] = None, **kwargs) -> RangeObservation:
400
+ # Route action.command to container via docker exec
401
+ # Compute reward via rubric
402
+ self._state.step_count += 1
403
+ obs = RangeObservation(stdout=result, stderr=err)
404
+ obs.reward = self._apply_rubric(action, obs) # Uses Rubric if set
405
+ return obs
406
+
407
+ @property
408
+ def state(self) -> RangeState:
409
+ return self._state
410
+ ```
411
+
412
+ ### App Factory (`server/app.py`)
413
+
414
+ ```python
415
+ from openenv.core.env_server import create_app
416
+ from server.models import RangeAction, RangeObservation
417
+ from server.environment import RangeEnvironment
418
+
419
+ # MUST pass CLASS (not instance) — enables WebSocket session isolation
420
+ app = create_app(RangeEnvironment, RangeAction, RangeObservation,
421
+ env_name="open_range")
422
+ ```
423
+
424
+ ### Client (`client/client.py`)
425
+
426
+ ```python
427
+ from openenv.core.env_client import EnvClient
428
+ from openenv.core.client_types import StepResult
429
+
430
+ class OpenRangeEnv(EnvClient[RangeAction, RangeObservation, RangeState]):
431
+ def _step_payload(self, action: RangeAction) -> dict:
432
+ return {"command": action.command, "mode": action.mode}
433
+
434
+ def _parse_result(self, payload: dict) -> StepResult[RangeObservation]:
435
+ obs = RangeObservation(**payload["observation"])
436
+ return StepResult(
437
+ observation=obs,
438
+ reward=payload.get("reward"),
439
+ done=bool(payload.get("done", False)),
440
+ )
441
+
442
+ def _parse_state(self, payload: dict) -> RangeState:
443
+ return RangeState(**payload)
444
+ ```
445
+
446
+ ### Endpoints (auto-provided by `create_app`)
447
+
448
+ ```
449
+ GET /health → {"status": "healthy"}
450
+ GET /metadata → environment name, version, description
451
+ GET /schema → JSON schemas for action, observation, state
452
+ POST /reset → initial RangeObservation
453
+ POST /step → RangeObservation after executing action
454
+ GET /state → current RangeState
455
+ WS /ws → persistent WebSocket session
456
+ ```
457
+
458
+ ### MCP Alternative
459
+
460
+ For richer tool discovery, OpenRange can also use `MCPEnvironment` with FastMCP tools:
461
+ ```python
462
+ class RangeEnvironment(MCPEnvironment):
463
+ def __init__(self):
464
+ mcp = FastMCP("open_range")
465
+
466
+ @mcp.tool
467
+ def nmap_scan(target: str, flags: str = "-sV") -> str:
468
+ """Run nmap port scan against target"""
469
+ return docker_exec(f"nmap {flags} {target}")
470
+
471
+ @mcp.tool
472
+ def submit_flag(flag: str) -> str:
473
+ """Submit a captured flag for verification"""
474
+ return verify_flag(flag)
475
+
476
+ super().__init__(mcp)
477
+ ```
478
+ This exposes tools via `ListToolsAction`/`CallToolAction` + MCP JSON-RPC at `/mcp`.
479
+
480
+ ## Reward Components
481
+
482
+ ### Red Operator
483
+ | Signal | Weight | Source |
484
+ |--------|--------|--------|
485
+ | `r_flag` | Binary | `docker exec cat <path>` at golden-path location |
486
+ | `r_efficiency` | `gamma^t` | Step count discount |
487
+ | `r_stealth` | Negative | IDS log entry count |
488
+ | `r_evidence` | 0-1 | Documentation quality of findings |
489
+ | `r_halluc` | -0.3 | Per fake flag (compared against manifest) |
490
+ | `r_social` | Binary | NPC fell for phish/pretext (Level 1+ only) |
491
+ | `r_complexity` | tier_mult | `tier_multiplier * base_reward` (1.0x-3.0x, scales with snapshot complexity) |
492
+
493
+ ### Blue Operator
494
+ | Signal | Weight | Source |
495
+ |--------|--------|--------|
496
+ | `r_detect` | 0-1 | TP rate (Red golden-path steps in logs) |
497
+ | `r_patch` | Binary | Validator re-runs Red exploit -> fails = valid patch |
498
+ | `r_avail` | 0-1 | Services still responding (healthchecks) |
499
+ | `r_FP` | -0.2 | Per false alarm (NPC traffic flagged as attack) |
500
+ | `r_phish_detect` | 0-1 | Correctly identified social engineering in logs (Level 1+ only) |
501
+ | `r_complexity` | tier_mult | `tier_multiplier * base_reward` (1.0x-3.0x, scales with snapshot complexity) |
502
+
503
+ ## Tier System (Horizontal Growth)
504
+
505
+ Each tier is a **fully integrated network** — services connect to each other, web apps talk to
506
+ databases, auth systems protect resources, logs flow to monitoring. Not isolated containers.
507
+
508
+ | Tier | Hosts | Networks | Integrated Services | Identity/Auth | Golden Steps |
509
+ |------|-------|----------|---------------------|---------------|--------------|
510
+ | 1 | attacker, firewall, web, mail, db, files, ldap, siem (8) | external, dmz, internal, mgmt | nginx+PHP web app → MySQL, postfix/dovecot, samba, OpenLDAP, rsyslog SIEM, iptables firewall | DB + LDAP user auth, session cookies | ~8 |
511
+ | 2 | + jumpbox, vpn (10-12) | + guest, vpn | + SSH bastion, OpenVPN, cron jobs | + SSH key auth, VPN cert auth, email-based password reset | ~15 |
512
+ | 3 | + CI/CD, dev-tools (14-18) | + partner, dev | + Jenkins/GitLab runner, dev endpoints | + AD/LDAP auth, Kerberos tickets, service accounts | ~25 |
513
+ | 4 | + OT/SCADA, cloud-proxy (20-25) | + OT, cloud | + Modbus/OPC-UA simulators, cloud gateway | + jump host required for OT, credential rotation, MFA | ~35 |
514
+ | 5 | + honeypots, WAF (30+) | + trap net | + decoy services, WAF, IDS, threat intel | + honeypot tokens, rate limiting, cert-based auth | ~50 |
515
+
516
+ ### How Services Integrate (Tier 1 — 8 Containers)
517
+
518
+ ```
519
+ [attacker] (external zone)
520
+ |
521
+ | port 80, 443, 25 only via firewall
522
+ v
523
+ [firewall] (perimeter) — iptables, NAT, zone enforcement, logs to siem
524
+ |
525
+ v
526
+ [web.corp.local] (DMZ 10.0.1.0/24) nginx + PHP web app
527
+ | - Login form -> authenticates against ldap (LDAP bind)
528
+ | - Product search -> SQL query to db (vuln injection point)
529
+ | - File upload -> stored on disk (vuln injection point)
530
+ | - All access logged to /var/log/nginx/access.log -> siem
531
+ |
532
+ ├──> [mail.corp.local] (DMZ) postfix + dovecot
533
+ | - User lookup against ldap
534
+ | - NPC email traffic + social engineering surface
535
+ | - Logs to siem
536
+ |
537
+ | port 3306 (internal only)
538
+ v
539
+ [db.corp.local] (internal 10.0.2.0/24) MySQL
540
+ | - users, products, flags tables
541
+ | - Query logs -> siem
542
+ |
543
+ [files.corp.local] (internal) samba
544
+ | - SMB shares, access via ldap auth
545
+ | - Logs to siem
546
+ |
547
+ [ldap.corp.local] (mgmt 10.0.3.0/24) OpenLDAP + Kerberos
548
+ | - Central auth for all services
549
+ | - Audit replication to siem
550
+ |
551
+ [siem.corp.local] (mgmt) rsyslog + log aggregation
552
+ - Blue's entry point — reads ALL logs here
553
+ - NPC traffic mixed with real attack traffic
554
+ - Blue reads logs, never touches web/db/files directly
555
+ ```
556
+
557
+ ## Agent Tool Philosophy: Container-as-Constraint
558
+
559
+ **No artificial allowlists.** Agents can run ANY command available in their container.
560
+ The Docker image defines what's possible — not code-level filtering.
561
+
562
+ ### How Commands Execute
563
+
564
+ ```
565
+ Agent sends: RangeAction(command="nmap -sV 10.0.1.0/24", mode="red")
566
+
567
+ environment.step() routes by mode:
568
+ Red → docker exec open-range-attacker-1 sh -c "nmap -sV 10.0.1.0/24"
569
+ Blue → docker exec open-range-siem-1 sh -c "..."
570
+
571
+ Raw stdout/stderr returned as RangeObservation
572
+ ```
573
+
574
+ No validation, sanitization, or allowlisting. The command string goes straight to `sh -c`.
575
+
576
+ ### What's Installed (Tier 1)
577
+
578
+ **Red (Kali)**: nmap, sqlmap, hydra, nikto, smbclient, curl, wget, netcat, ssh,
579
+ dnsutils, tcpdump, python3+pip. Plus all standard Kali/Debian tools. Agents can
580
+ `pip install` or `apt install` additional tools at runtime.
581
+
582
+ **Blue (SIEM)**: rsyslog, grep/awk/sed, jq, curl, ssh. All logs aggregated at
583
+ `/var/log/siem/consolidated/all.log`. Agents can write custom scripts, parse JSON,
584
+ correlate events — whatever Unix tools allow.
585
+
586
+ ### Meta-Commands (Handled by Environment, Not Containers)
587
+
588
+ These are intercepted before docker exec:
589
+
590
+ | Command | Role | Effect |
591
+ |---------|------|--------|
592
+ | `submit_flag <value>` | Red | Validates against snapshot flags; -0.3 penalty per hallucinated flag |
593
+ | `submit_evidence <json>` | Red | Logs findings for evidence reward scoring |
594
+ | `submit_finding <desc>` | Blue | Logs attack detection for accuracy scoring |
595
+ | `auth <host> <user> <pass>` | Both | Validates creds against snapshot topology |
596
+ | `logout <host>` | Both | Terminates active session |
597
+
598
+ ### What Agents Should NOT Be Told
599
+
600
+ Agent prompts should NOT enumerate allowed tools. Instead:
601
+ - Red: "You have a Kali workstation. Run any command."
602
+ - Blue: "You have the SIEM console. Use any tool to investigate."
603
+
604
+ The agent discovers what's available through reconnaissance (e.g., `which sqlmap`,
605
+ `ls /usr/bin/`, `pip list`). This mirrors real pentesting and SOC work.
606
+
607
+ ### Docker Network Topology (Tier 1)
608
+
609
+ ```
610
+ attacker (10.0.0.2) → firewall (10.0.0.3/10.0.1.2) → web (10.0.1.4)
611
+ NAT + iptables → mail (10.0.1.3)
612
+ → db (10.0.2.x)
613
+ → files (10.0.2.x)
614
+ → ldap (10.0.3.x)
615
+ → siem (10.0.3.x)
616
+ ```
617
+
618
+ Attacker routes to DMZ/internal/mgmt via firewall. Only ports 80, 443, 25 pass
619
+ from external→DMZ. The firewall enforces zone segmentation per manifest rules.
620
+
621
+ ## Builder LLM Schema Alignment (IMPORTANT)
622
+
623
+ The Builder prompt schema and the Pydantic models MUST match field names.
624
+ Mismatches cause `ValidationError` at parse time. Known mappings handled
625
+ by `_parse_llm_response()` in `builder/builder.py`:
626
+
627
+ | Prompt Schema | Pydantic Model | Parser Handles |
628
+ |---------------|----------------|----------------|
629
+ | `exploit_chain[].vuln` | `ExploitStep.vuln_id` | Yes |
630
+ | `exploit_chain[].action` | `ExploitStep.command` | Yes |
631
+ | `exploit_chain[].yields` | `ExploitStep.description` | Yes |
632
+ | `golden_path[].cmd` | `GoldenPathStep.command` | Yes |
633
+ | `golden_path[].expect_stdout` | `GoldenPathStep.expect_in_stdout` | Yes |
634
+ | `accounts.smb_shares` (list) | `NPCPersona.accounts` (dict[str, Any]) | Yes |
635
+ | `evidence_spec` (dict) | `list[EvidenceItem]` | Yes |
636
+
637
+ **Rule**: When adding new fields to SnapshotSpec or its children, update BOTH
638
+ the builder prompt schema AND the `_parse_llm_response()` mapper. If the LLM
639
+ returns a different field name, add a fallback in the parser like
640
+ `ec.get("vuln_id", ec.get("vuln", ""))`.
641
+
642
+ ## Azure OpenAI Configuration
643
+
644
+ For LLM builder/validator, set these env vars:
645
+
646
+ ```bash
647
+ export AZURE_API_KEY="..."
648
+ export AZURE_API_BASE="https://<endpoint>.cognitiveservices.azure.com"
649
+ export AZURE_API_VERSION="2025-04-01-preview"
650
+ export OPENRANGE_BUILDER_MODEL="azure/gpt-5.2" # or any azure/<deployment>
651
+ ```
652
+
653
+ LiteLLM reads these automatically. Model format: `azure/<deployment_name>`.
654
+
655
+ ## Build & Development Commands
656
+
657
+ ```bash
658
+ # Install dependencies
659
+ uv sync --all-extras
660
+
661
+ # Run tests (549 tests)
662
+ uv run pytest tests/ -v --tb=short
663
+
664
+ # Run OpenEnv server locally (mock mode, no Docker needed)
665
+ uv run uvicorn open_range.server.app:app --host 0.0.0.0 --port 8000
666
+
667
+ # Run demo episode (no Docker, no LLM)
668
+ uv run python examples/demo.py
669
+
670
+ # Build and start full Docker range stack (9 containers)
671
+ docker compose build && docker compose up -d
672
+
673
+ # Test LLM builder with Azure creds
674
+ uv run python scripts/test_tier1_llm.py
675
+
676
+ # Deploy to HF Spaces
677
+ bash scripts/deploy_hf.sh
678
+ ```
679
+
680
+ ### Docker Gotchas (Apple Silicon / ARM64)
681
+
682
+ - MySQL 5.7 has NO ARM64 images. Use `mysql:8.0` in docker-compose.yml.
683
+ - PHP-FPM socket: Ubuntu 22.04 installs as `php8.1-fpm`, socket at
684
+ `/run/php/php8.1-fpm.sock` (not generic `/run/php/php-fpm.sock`).
685
+ - Attacker container needs `cap_add: [NET_ADMIN]` + `iproute2` to add
686
+ routes to DMZ/internal/mgmt subnets via the firewall gateway.
687
+ - Container names follow Docker Compose convention: `open-range-<service>-1`.
688
+ The environment resolves these via `_container_name()` discovery.
689
+
690
+ ## Key References
691
+
692
+ - **OpenEnv**: `../References/OpenEnv/` (full reference repo)
693
+ - **OpenEnv coding_env**: Pattern to follow for server/client structure
694
+ - **OpenEnv RFC 001**: Agent vs Environment boundary (MCP + HTTP duality)
695
+ - **OpenEnv RFC 004**: Rubric system for composable rewards
696
+ - **R2E-Gym**: `../References/R2E-Gym/` (full codebase) + `../2504.07164v1.pdf` (paper). Procedural env generation via backtranslation, hybrid verifiers (execution + LLM), 8.1K executable tasks. Key lesson: hybrid verification breaks through single-method plateaus.
697
+ - **Self-Play SWE-RL**: `../2512.18552v1.pdf`. Bug-injector + bug-solver self-play with shared weights. Key lessons: formal specs > NL, 7-check consistency validation, inverse mutation testing, frontier-calibrating Builder reward `r_inject = 1-(1+α)s`, higher-order challenges from failed attempts.
698
+ - **CyBench** (ICLR'25): CTF benchmark (saturating, static)
699
+ - **CVE-Bench** (ICML'25): Reward hacking lesson (agents gamed shortcuts)
700
+ - **CybORG CAGE 4**: Red/Blue/Green agent model
701
+
702
+ ## Hackathon Scope & Priority
703
+
704
+ ### CORE (must ship — the OpenEnv environment)
705
+ 1. **Manifest schema** + example YAML manifests with golden paths
706
+ 2. **Builder LLM** — generates/mutates range infrastructure from manifest (structured JSON → templates → Docker)
707
+ 3. **Validator** — hybrid LLM review + 7-check scripted execution (including inverse mutation test)
708
+ 4. **OpenEnv server** — `RangeEnvironment(Environment)` with `reset()`, `step()`, `state`, deployed on HF Spaces
709
+ 5. **Rewards** — `Rubric` subclasses for Red and Blue, all verifiable against container state
710
+ 6. **Client** — `OpenRangeEnv(EnvClient)` with typed parsing
711
+ 7. **NPC traffic** — background noise for Blue
712
+
713
+ ### DEFERRED (training is downstream of the environment)
714
+ Training scripts (GRPO, SFT, curriculum) are **out of scope for hackathon core**. The environment
715
+ must work first — anyone can plug in TRL/Unsloth/SkyRL later via `rollout_func`. We demonstrate
716
+ the environment with scripted or manual agents, not trained ones.
717
+
718
+ ### Constraints
719
+ - **OpenEnv 0.2.x** on HF Spaces (FastAPI server with typed Pydantic models)
720
+ - **Infra**: HF Spaces (OpenEnv server) + Docker host (range containers)
721
+ - **Demo**: 1-min YouTube showing YAML → Builder generates range → Validator confirms → Red agent exploits → Blue agent defends → Builder mutates → new challenge
722
+ - **License**: Apache 2.0
README.md CHANGED
@@ -1,7 +1,15 @@
1
  ---
2
- title: OpenRange
 
 
 
3
  sdk: docker
 
4
  app_port: 8000
 
 
 
 
5
  ---
6
 
7
  # OpenRange
 
1
  ---
2
+ title: OpenRange Environment Server
3
+ emoji: 🎯
4
+ colorFrom: red
5
+ colorTo: blue
6
  sdk: docker
7
+ pinned: false
8
  app_port: 8000
9
+ base_path: /web
10
+ tags:
11
+ - openenv
12
+ - rl-environment
13
  ---
14
 
15
  # OpenRange
openenv.yaml CHANGED
@@ -4,5 +4,3 @@ type: space
4
  runtime: fastapi
5
  app: server.app:app
6
  port: 8000
7
- version: 0.1.0
8
- description: "Multi-agent cybersecurity gymnasium built on OpenEnv"
 
4
  runtime: fastapi
5
  app: server.app:app
6
  port: 8000
 
 
pyproject.toml CHANGED
@@ -1,17 +1,22 @@
 
 
 
 
1
  [project]
2
- name = "open-range"
3
  version = "0.1.0"
4
  description = "Multi-agent cybersecurity gymnasium built on OpenEnv"
5
  requires-python = ">=3.11"
6
  license = "Apache-2.0"
7
  dependencies = [
8
  "openenv-core[core]>=0.2.1",
9
- "fastapi>=0.115",
10
- "pydantic>=2.0",
 
11
  "pyyaml>=6.0",
12
  "docker>=7.0",
13
  "jinja2>=3.1",
14
- "uvicorn>=0.27",
15
  ]
16
 
17
  [project.optional-dependencies]
@@ -19,15 +24,24 @@ dev = ["pytest>=8.0", "pytest-asyncio>=0.23", "httpx>=0.27"]
19
  training = ["trl>=0.8", "unsloth"]
20
  builder = ["litellm>=1.30"]
21
 
22
- [build-system]
23
- requires = ["hatchling"]
24
- build-backend = "hatchling.build"
25
-
26
- [tool.hatch.build.targets.wheel]
27
- packages = ["src/open_range"]
28
-
29
  [project.scripts]
 
30
  server = "open_range.server.app:main"
31
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
  [tool.pytest.ini_options]
33
  asyncio_mode = "auto"
 
1
+ [build-system]
2
+ requires = ["setuptools>=45", "wheel"]
3
+ build-backend = "setuptools.build_meta"
4
+
5
  [project]
6
+ name = "openenv-open-range"
7
  version = "0.1.0"
8
  description = "Multi-agent cybersecurity gymnasium built on OpenEnv"
9
  requires-python = ">=3.11"
10
  license = "Apache-2.0"
11
  dependencies = [
12
  "openenv-core[core]>=0.2.1",
13
+ "click>=8.1",
14
+ "fastapi>=0.115.0",
15
+ "pydantic>=2.0.0",
16
  "pyyaml>=6.0",
17
  "docker>=7.0",
18
  "jinja2>=3.1",
19
+ "uvicorn>=0.24.0",
20
  ]
21
 
22
  [project.optional-dependencies]
 
24
  training = ["trl>=0.8", "unsloth"]
25
  builder = ["litellm>=1.30"]
26
 
 
 
 
 
 
 
 
27
  [project.scripts]
28
+ openrange = "open_range.cli:cli"
29
  server = "open_range.server.app:main"
30
 
31
+ [tool.setuptools]
32
+ include-package-data = true
33
+ packages = [
34
+ "open_range",
35
+ "open_range.agents",
36
+ "open_range.builder",
37
+ "open_range.builder.npc",
38
+ "open_range.client",
39
+ "open_range.server",
40
+ "open_range.training",
41
+ "open_range.validator",
42
+ ]
43
+ package-dir = { "" = "src" }
44
+ package-data = { "open_range" = ["**/*.yaml", "**/*.yml"] }
45
+
46
  [tool.pytest.ini_options]
47
  asyncio_mode = "auto"
server/Dockerfile CHANGED
@@ -1,23 +1,42 @@
1
- FROM python:3.11-slim
 
2
 
3
  WORKDIR /app
4
 
5
- RUN apt-get update && \
6
- apt-get install -y --no-install-recommends \
7
- docker.io \
8
- curl \
 
9
  && rm -rf /var/lib/apt/lists/*
10
 
11
- COPY pyproject.toml .
12
- COPY openenv.yaml .
13
- COPY server/ server/
14
- COPY src/ src/
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
- RUN pip install --no-cache-dir -e .
 
17
 
18
- EXPOSE 8000
 
19
 
20
- HEALTHCHECK --interval=30s --timeout=5s --retries=3 \
21
- CMD curl -f http://localhost:8000/health || exit 1
22
 
23
- CMD ["uvicorn", "server.app:app", "--host", "0.0.0.0", "--port", "8000"]
 
1
+ ARG BASE_IMAGE=ghcr.io/meta-pytorch/openenv-base:latest
2
+ FROM ${BASE_IMAGE} AS builder
3
 
4
  WORKDIR /app
5
 
6
+ COPY . /app/env
7
+ WORKDIR /app/env
8
+
9
+ # Install git for git+ dependencies
10
+ RUN apt-get update && apt-get install -y --no-install-recommends git \
11
  && rm -rf /var/lib/apt/lists/*
12
 
13
+ # Two-pass install for better layer caching
14
+ RUN --mount=type=cache,target=/root/.cache/uv \
15
+ if [ -f uv.lock ]; then \
16
+ uv sync --frozen --no-install-project --no-editable; \
17
+ else \
18
+ uv sync --no-install-project --no-editable; \
19
+ fi
20
+
21
+ RUN --mount=type=cache,target=/root/.cache/uv \
22
+ if [ -f uv.lock ]; then \
23
+ uv sync --frozen --no-editable; \
24
+ else \
25
+ uv sync --no-editable; \
26
+ fi
27
+
28
+ # Runtime stage
29
+ FROM ${BASE_IMAGE}
30
+
31
+ WORKDIR /app
32
 
33
+ COPY --from=builder /app/env/.venv /app/.venv
34
+ COPY --from=builder /app/env /app/env
35
 
36
+ ENV PATH="/app/.venv/bin:$PATH"
37
+ ENV PYTHONPATH="/app/env/src:/app/env:$PYTHONPATH"
38
 
39
+ HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
40
+ CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')" || exit 1
41
 
42
+ CMD ["sh", "-c", "cd /app/env && uvicorn server.app:app --host 0.0.0.0 --port 8000"]
server/__init__.py CHANGED
@@ -1,6 +1,6 @@
1
  """Repository-level OpenEnv server entrypoints."""
2
 
3
- from .app import app, create_app
4
  from .environment import RangeEnvironment
5
 
6
- __all__ = ["RangeEnvironment", "app", "create_app"]
 
1
  """Repository-level OpenEnv server entrypoints."""
2
 
3
+ from .app import app, main
4
  from .environment import RangeEnvironment
5
 
6
+ __all__ = ["RangeEnvironment", "app", "main"]
server/app.py CHANGED
@@ -1,10 +1,16 @@
1
- """OpenEnv app entrypoint expected by ``openenv.yaml``."""
 
 
 
 
 
 
2
 
3
  from __future__ import annotations
4
 
5
- from open_range.server.app import app, create_app
6
 
7
- __all__ = ["app", "create_app"]
8
 
9
 
10
  def main() -> None:
 
1
+ """OpenEnv app entrypoint expected by ``openenv.yaml``.
2
+
3
+ Thin wrapper that delegates to the real app factory in
4
+ ``open_range.server.app``. This file lives at the repo root
5
+ so the Dockerfile CMD ``cd /app/env && uvicorn server.app:app``
6
+ resolves correctly inside HF Spaces.
7
+ """
8
 
9
  from __future__ import annotations
10
 
11
+ from open_range.server.app import create_app as _create_app
12
 
13
+ app = _create_app()
14
 
15
 
16
  def main() -> None:
src/open_range/builder/builder.py CHANGED
@@ -3,6 +3,9 @@
3
  - LLMSnapshotBuilder: production -- uses litellm to generate snapshot specs
4
  - TemplateOnlyBuilder: testing -- deterministic, no LLM calls
5
  - FileBuilder: demos -- loads a pre-built snapshot from a JSON file
 
 
 
6
  """
7
 
8
  from __future__ import annotations
@@ -12,7 +15,9 @@ import logging
12
  import os
13
  import random
14
  from pathlib import Path
15
- from typing import Any
 
 
16
 
17
  try:
18
  import litellm
@@ -38,6 +43,106 @@ from open_range.builder.prompts import BUILDER_SYSTEM_PROMPT
38
  logger = logging.getLogger(__name__)
39
 
40
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
  # ---------------------------------------------------------------------------
42
  # LLM-based builder (production)
43
  # ---------------------------------------------------------------------------
@@ -57,7 +162,18 @@ class LLMSnapshotBuilder:
57
  temperature: float = 0.7,
58
  max_retries: int = 3,
59
  max_tokens: int = 32768,
 
60
  ) -> None:
 
 
 
 
 
 
 
 
 
 
61
  self.model = model or os.environ.get(
62
  "OPENRANGE_BUILDER_MODEL", "anthropic/claude-sonnet-4-20250514"
63
  )
@@ -65,13 +181,18 @@ class LLMSnapshotBuilder:
65
  self.temperature = temperature
66
  self.max_retries = max_retries
67
  self.max_tokens = max_tokens
 
68
 
69
  async def build(
70
  self,
71
  manifest: dict,
72
  context: BuildContext,
73
  ) -> SnapshotSpec:
74
- """Call LLM to generate a candidate snapshot spec."""
 
 
 
 
75
  if litellm is None:
76
  raise RuntimeError(
77
  "LLMSnapshotBuilder requires the optional builder extra. "
@@ -89,23 +210,29 @@ class LLMSnapshotBuilder:
89
  )
90
  )
91
 
 
 
 
 
 
 
92
  last_error: Exception | None = None
 
93
  for attempt in range(1, self.max_retries + 1):
94
  try:
95
  messages: list[dict[str, str]] = [
96
  {"role": "system", "content": self.prompt_template},
97
  {"role": "user", "content": user_payload},
98
  ]
99
- # If retrying after a validation error, append error context
100
- error = getattr(context, "error", None)
101
- if error and attempt > 1:
102
  messages.append(
103
  {
104
  "role": "user",
105
  "content": (
106
- "Previous attempt failed validation. "
107
- f"Error: {json.dumps(error)}\n"
108
- "Please fix and regenerate."
109
  ),
110
  }
111
  )
@@ -114,6 +241,7 @@ class LLMSnapshotBuilder:
114
  "model": self.model,
115
  "messages": messages,
116
  "max_tokens": self.max_tokens,
 
117
  }
118
  # Codex models don't support temperature
119
  if self.temperature is not None:
@@ -121,24 +249,56 @@ class LLMSnapshotBuilder:
121
  # Request JSON output; some models need the word "json"
122
  # in messages to use json_object format
123
  kwargs["response_format"] = {"type": "json_object"}
 
 
 
 
 
 
 
124
  response = await litellm.acompletion(**kwargs)
125
 
126
  raw = response.choices[0].message.content
 
 
 
 
127
  spec = _parse_llm_response(raw)
128
  logger.info(
129
- "LLMSnapshotBuilder: generated snapshot %s (attempt %d)",
130
- spec.topology.get("hosts", [])[:3],
131
  attempt,
 
 
 
132
  )
133
  return spec
134
 
135
- except Exception as exc:
 
 
 
 
 
 
 
 
 
136
  last_error = exc
 
137
  logger.warning(
 
 
 
 
 
 
 
 
 
138
  "LLMSnapshotBuilder attempt %d/%d failed: %s",
139
  attempt,
140
  self.max_retries,
141
- exc,
142
  )
143
 
144
  raise RuntimeError(
@@ -147,76 +307,182 @@ class LLMSnapshotBuilder:
147
  )
148
 
149
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
150
  def _parse_llm_response(raw_json: str) -> SnapshotSpec:
151
  """Parse raw JSON from LLM into a validated SnapshotSpec.
152
 
153
- Handles the fact that the LLM output schema (from docs/builder-validator.md)
154
- differs slightly from the SnapshotSpec Pydantic model in protocols.py.
 
155
  """
156
- data = json.loads(raw_json)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
157
 
158
  # Map truth_graph vulns
159
  vulns = []
160
- for v in data.get("truth_graph", {}).get("vulns", []):
161
- vulns.append(
162
- Vulnerability(
163
- id=v.get("id", ""),
164
- type=v.get("type", ""),
165
- host=v.get("host", ""),
166
- service=v.get("service", ""),
167
- injection_point=v.get("injection_point", ""),
168
- vulnerable_code=v.get("vulnerable_code", ""),
169
- root_cause=v.get("root_cause", ""),
170
- blast_radius=v.get("blast_radius", ""),
171
- remediation=v.get("remediation", ""),
 
 
172
  )
173
- )
 
 
 
 
 
 
 
174
 
175
  # Map exploit_chain -- LLM uses "vuln"/"action", protocol uses "vuln_id"/"command"
176
  exploit_chain = []
177
- for ec in data.get("truth_graph", {}).get("exploit_chain", []):
178
- exploit_chain.append(
179
- ExploitStep(
180
- vuln_id=ec.get("vuln_id", ec.get("vuln", "")),
181
- command=ec.get("command", ec.get("action", "")),
182
- description=ec.get("description", ec.get("yields", "")),
 
 
 
 
 
 
 
 
 
 
 
 
 
183
  )
184
- )
185
 
186
  truth_graph = TruthGraph(
187
  vulns=vulns,
188
  exploit_chain=exploit_chain,
189
  )
190
 
191
- # Map golden_path -- LLM uses "expect_stdout", protocol uses "expect_in_stdout"
192
  golden_path = []
193
- for step in data.get("golden_path", []):
 
 
 
 
 
 
 
 
 
 
 
 
194
  golden_path.append(
195
  GoldenPathStep(
196
- step=step.get("step", 0),
197
- command=step.get("cmd", step.get("command", "")),
198
- expect_in_stdout=step.get(
199
- "expect_stdout", step.get("expect_in_stdout", "")
200
- ),
201
- description=step.get("description", ""),
202
  )
203
  )
204
 
205
  # Map flags
206
- flags = [
207
- FlagSpec(
208
- id=f.get("id", ""),
209
- value=f.get("value", ""),
210
- path=f.get("path", ""),
211
- host=f.get("host", ""),
212
- )
213
- for f in data.get("flags", [])
214
- ]
215
-
216
- # Map evidence_spec -- LLM returns dict, protocol expects list[EvidenceItem]
217
- evidence_raw = data.get("evidence_spec", {})
 
 
 
 
 
 
 
 
 
218
  evidence_spec: list[EvidenceItem] = []
 
219
  if isinstance(evidence_raw, dict):
 
220
  for key, val in evidence_raw.items():
221
  if isinstance(val, list):
222
  for item in val:
@@ -234,23 +500,31 @@ def _parse_llm_response(raw_json: str) -> SnapshotSpec:
234
 
235
  # Map NPC personas
236
  npc_personas = []
237
- for p in data.get("npc_personas", []):
238
- npc_personas.append(
239
- NPCPersona(
240
- name=p.get("name", ""),
241
- role=p.get("role", ""),
242
- department=p.get("department", ""),
243
- reports_to=p.get("reports_to", ""),
244
- communication_style=p.get("communication_style", ""),
245
- security_awareness=p.get("security_awareness", 0.5),
246
- susceptibility=p.get("susceptibility", {}),
247
- routine=p.get("routine", {}),
248
- accounts=p.get("accounts", {}),
 
 
 
 
 
 
 
 
 
249
  )
250
- )
251
 
252
  # Map NPC traffic
253
- npc_raw = data.get("npc_traffic", {})
254
  npc_traffic = NPCTrafficSpec(
255
  level=0,
256
  rate_lambda=npc_raw.get("http_rate", 10),
@@ -258,19 +532,17 @@ def _parse_llm_response(raw_json: str) -> SnapshotSpec:
258
  )
259
 
260
  # Map task
261
- task_raw = data.get("task", {})
262
  task = TaskSpec(
263
- red_briefing=task_raw.get("red_briefing", ""),
264
- blue_briefing=task_raw.get("blue_briefing", ""),
265
  )
266
 
267
  # Map files -- explicit files from LLM + extract from vulnerable_code
268
  files: dict[str, str] = {}
269
 
270
  # 1. Explicit files field from LLM output
271
- files_raw = data.get("files", {})
272
- if isinstance(files_raw, dict):
273
- for key, content in files_raw.items():
274
  if isinstance(content, str):
275
  files[key] = content
276
 
@@ -289,8 +561,16 @@ def _parse_llm_response(raw_json: str) -> SnapshotSpec:
289
  if container_key not in files:
290
  files[container_key] = vc
291
 
 
 
 
 
 
 
 
 
292
  return SnapshotSpec(
293
- topology=data.get("topology", {}),
294
  truth_graph=truth_graph,
295
  golden_path=golden_path,
296
  flags=flags,
@@ -629,6 +909,7 @@ class TemplateOnlyBuilder:
629
  """
630
 
631
  def __init__(self, vuln_pool: list[dict[str, Any]] | None = None) -> None:
 
632
  self.vuln_pool = vuln_pool or _DEFAULT_VULN_POOL
633
 
634
  async def build(
@@ -765,6 +1046,12 @@ class TemplateOnlyBuilder:
765
  scripts=["http_traffic.sh", "db_traffic.sh"],
766
  )
767
 
 
 
 
 
 
 
768
  return SnapshotSpec(
769
  topology=topology,
770
  truth_graph=truth_graph,
@@ -790,6 +1077,7 @@ class FileBuilder:
790
  """
791
 
792
  def __init__(self, snapshot_dir: str = "snapshots") -> None:
 
793
  self.snapshot_dir = Path(snapshot_dir)
794
 
795
  async def build(
@@ -797,7 +1085,7 @@ class FileBuilder:
797
  manifest: dict,
798
  context: BuildContext,
799
  ) -> SnapshotSpec:
800
- """Load the snapshot JSON, optionally picking by seed."""
801
  if not self.snapshot_dir.exists():
802
  raise FileNotFoundError(
803
  f"Snapshot directory not found: {self.snapshot_dir}"
@@ -817,5 +1105,6 @@ class FileBuilder:
817
  else:
818
  chosen = files[0]
819
 
 
820
  raw = json.loads(chosen.read_text())
821
  return _parse_llm_response(json.dumps(raw))
 
3
  - LLMSnapshotBuilder: production -- uses litellm to generate snapshot specs
4
  - TemplateOnlyBuilder: testing -- deterministic, no LLM calls
5
  - FileBuilder: demos -- loads a pre-built snapshot from a JSON file
6
+
7
+ Each builder implements the SnapshotBuilder protocol and returns a validated
8
+ SnapshotSpec that can be rendered into Docker artifacts by the SnapshotRenderer.
9
  """
10
 
11
  from __future__ import annotations
 
15
  import os
16
  import random
17
  from pathlib import Path
18
+ from typing import Any, Optional
19
+
20
+ from pydantic import BaseModel, Field
21
 
22
  try:
23
  import litellm
 
43
  logger = logging.getLogger(__name__)
44
 
45
 
46
+ # ---------------------------------------------------------------------------
47
+ # LLM raw output model -- matches the LLM's JSON schema exactly
48
+ # ---------------------------------------------------------------------------
49
+
50
+
51
+ class _LLMVulnerability(BaseModel):
52
+ """Raw vulnerability as returned by the LLM."""
53
+
54
+ id: str = ""
55
+ type: str = ""
56
+ host: str = ""
57
+ service: str = ""
58
+ injection_point: str = ""
59
+ vulnerable_code: str | dict[str, str] = ""
60
+ root_cause: str = ""
61
+ blast_radius: str = ""
62
+ remediation: str = ""
63
+
64
+
65
+ class _LLMExploitStep(BaseModel):
66
+ """Raw exploit step -- LLM uses 'vuln'/'action'/'yields' field names."""
67
+
68
+ vuln: str = ""
69
+ vuln_id: str = ""
70
+ action: str = ""
71
+ command: str = ""
72
+ yields: str = ""
73
+ description: str = ""
74
+
75
+
76
+ class _LLMGoldenPathStep(BaseModel):
77
+ """Raw golden path step -- LLM uses 'cmd' and 'expect_stdout'."""
78
+
79
+ step: int = 0
80
+ cmd: str = ""
81
+ command: str = ""
82
+ expect_stdout: str = ""
83
+ expect_in_stdout: str = ""
84
+ description: str = ""
85
+ host: str = "attacker"
86
+
87
+
88
+ class _LLMFlag(BaseModel):
89
+ """Raw flag definition from LLM output."""
90
+
91
+ id: str = ""
92
+ value: str = ""
93
+ path: str = ""
94
+ host: str = ""
95
+
96
+
97
+ class _LLMNPCPersona(BaseModel):
98
+ """Raw NPC persona from LLM output."""
99
+
100
+ name: str = ""
101
+ role: str = ""
102
+ department: str = ""
103
+ reports_to: str = ""
104
+ communication_style: str = ""
105
+ security_awareness: float = 0.5
106
+ susceptibility: dict[str, Any] = Field(default_factory=dict)
107
+ routine: dict[str, Any] = Field(default_factory=dict)
108
+ accounts: dict[str, Any] = Field(default_factory=dict)
109
+
110
+
111
+ class _LLMTruthGraph(BaseModel):
112
+ """Raw truth graph from LLM output."""
113
+
114
+ vulns: list[_LLMVulnerability] = Field(default_factory=list)
115
+ exploit_chain: list[_LLMExploitStep] = Field(default_factory=list)
116
+
117
+
118
+ class _LLMTask(BaseModel):
119
+ """Raw task specification from LLM output."""
120
+
121
+ red_briefing: str = ""
122
+ blue_briefing: str = ""
123
+
124
+
125
+ class LLMSnapshotOutput(BaseModel):
126
+ """Intermediate model matching the LLM's raw JSON schema.
127
+
128
+ This captures the exact field names the LLM produces, including
129
+ known mismatches like 'vuln' vs 'vuln_id', 'cmd' vs 'command',
130
+ and 'expect_stdout' vs 'expect_in_stdout'. Parsing into this model
131
+ first makes schema mismatches explicit and testable before mapping
132
+ to the canonical SnapshotSpec.
133
+ """
134
+
135
+ topology: dict[str, Any] = Field(default_factory=dict)
136
+ truth_graph: _LLMTruthGraph = Field(default_factory=_LLMTruthGraph)
137
+ golden_path: list[_LLMGoldenPathStep] = Field(default_factory=list)
138
+ flags: list[_LLMFlag] = Field(default_factory=list)
139
+ evidence_spec: dict[str, Any] | list[dict[str, Any]] = Field(default_factory=dict)
140
+ npc_personas: list[_LLMNPCPersona] = Field(default_factory=list)
141
+ npc_traffic: dict[str, Any] = Field(default_factory=dict)
142
+ task: _LLMTask = Field(default_factory=_LLMTask)
143
+ files: dict[str, str] = Field(default_factory=dict)
144
+
145
+
146
  # ---------------------------------------------------------------------------
147
  # LLM-based builder (production)
148
  # ---------------------------------------------------------------------------
 
162
  temperature: float = 0.7,
163
  max_retries: int = 3,
164
  max_tokens: int = 32768,
165
+ timeout: float = 120.0,
166
  ) -> None:
167
+ """Initialize the LLM-based snapshot builder.
168
+
169
+ Args:
170
+ model: LiteLLM model identifier (e.g. 'azure/gpt-5.2').
171
+ prompt_template: System prompt override.
172
+ temperature: Sampling temperature for LLM calls.
173
+ max_retries: Maximum number of LLM call + parse attempts.
174
+ max_tokens: Maximum tokens in LLM response.
175
+ timeout: Timeout in seconds for each LLM call.
176
+ """
177
  self.model = model or os.environ.get(
178
  "OPENRANGE_BUILDER_MODEL", "anthropic/claude-sonnet-4-20250514"
179
  )
 
181
  self.temperature = temperature
182
  self.max_retries = max_retries
183
  self.max_tokens = max_tokens
184
+ self.timeout = timeout
185
 
186
  async def build(
187
  self,
188
  manifest: dict,
189
  context: BuildContext,
190
  ) -> SnapshotSpec:
191
+ """Call LLM to generate a candidate snapshot spec.
192
+
193
+ Retries on LLM or parse failures, appending error context to each
194
+ subsequent attempt so the LLM can self-correct.
195
+ """
196
  if litellm is None:
197
  raise RuntimeError(
198
  "LLMSnapshotBuilder requires the optional builder extra. "
 
210
  )
211
  )
212
 
213
+ logger.info(
214
+ "LLMSnapshotBuilder: starting build (model=%s, tier=%d)",
215
+ self.model,
216
+ context.tier,
217
+ )
218
+
219
  last_error: Exception | None = None
220
+ last_error_msg: str = ""
221
  for attempt in range(1, self.max_retries + 1):
222
  try:
223
  messages: list[dict[str, str]] = [
224
  {"role": "system", "content": self.prompt_template},
225
  {"role": "user", "content": user_payload},
226
  ]
227
+ # If retrying after a failure, append error context so LLM can fix
228
+ if attempt > 1 and last_error_msg:
 
229
  messages.append(
230
  {
231
  "role": "user",
232
  "content": (
233
+ "Previous attempt failed. "
234
+ f"Error: {last_error_msg}\n"
235
+ "Please fix and regenerate the complete JSON."
236
  ),
237
  }
238
  )
 
241
  "model": self.model,
242
  "messages": messages,
243
  "max_tokens": self.max_tokens,
244
+ "timeout": self.timeout,
245
  }
246
  # Codex models don't support temperature
247
  if self.temperature is not None:
 
249
  # Request JSON output; some models need the word "json"
250
  # in messages to use json_object format
251
  kwargs["response_format"] = {"type": "json_object"}
252
+
253
+ logger.debug(
254
+ "LLMSnapshotBuilder: sending request (attempt %d/%d, timeout=%.0fs)",
255
+ attempt,
256
+ self.max_retries,
257
+ self.timeout,
258
+ )
259
  response = await litellm.acompletion(**kwargs)
260
 
261
  raw = response.choices[0].message.content
262
+ logger.debug(
263
+ "LLMSnapshotBuilder: received response (%d chars)",
264
+ len(raw) if raw else 0,
265
+ )
266
  spec = _parse_llm_response(raw)
267
  logger.info(
268
+ "LLMSnapshotBuilder: build completed (attempt %d/%d, %d vulns, %d golden path steps)",
 
269
  attempt,
270
+ self.max_retries,
271
+ len(spec.truth_graph.vulns),
272
+ len(spec.golden_path),
273
  )
274
  return spec
275
 
276
+ except json.JSONDecodeError as exc:
277
+ last_error = exc
278
+ last_error_msg = f"JSON parse error at position {exc.pos}: {exc.msg}"
279
+ logger.warning(
280
+ "LLMSnapshotBuilder attempt %d/%d: JSON parse failed: %s",
281
+ attempt,
282
+ self.max_retries,
283
+ last_error_msg,
284
+ )
285
+ except SnapshotParseError as exc:
286
  last_error = exc
287
+ last_error_msg = str(exc)
288
  logger.warning(
289
+ "LLMSnapshotBuilder attempt %d/%d: snapshot parse failed: %s",
290
+ attempt,
291
+ self.max_retries,
292
+ last_error_msg,
293
+ )
294
+ except Exception as exc:
295
+ last_error = exc
296
+ last_error_msg = f"{type(exc).__name__}: {exc}"
297
+ logger.error(
298
  "LLMSnapshotBuilder attempt %d/%d failed: %s",
299
  attempt,
300
  self.max_retries,
301
+ last_error_msg,
302
  )
303
 
304
  raise RuntimeError(
 
307
  )
308
 
309
 
310
+ # ---------------------------------------------------------------------------
311
+ # Parse error with context
312
+ # ---------------------------------------------------------------------------
313
+
314
+
315
+ class SnapshotParseError(Exception):
316
+ """Raised when LLM output cannot be parsed into a valid SnapshotSpec.
317
+
318
+ Includes the field that failed, received value, expected format,
319
+ and a truncated snippet of the raw JSON for debugging.
320
+ """
321
+
322
+ def __init__(
323
+ self,
324
+ message: str,
325
+ field: str = "",
326
+ received: Any = None,
327
+ expected: str = "",
328
+ raw_json_snippet: str = "",
329
+ ) -> None:
330
+ self.field = field
331
+ self.received = received
332
+ self.expected = expected
333
+ self.raw_json_snippet = raw_json_snippet
334
+ parts = [message]
335
+ if field:
336
+ parts.append(f"field={field!r}")
337
+ if received is not None:
338
+ recv_str = repr(received)
339
+ if len(recv_str) > 200:
340
+ recv_str = recv_str[:200] + "..."
341
+ parts.append(f"received={recv_str}")
342
+ if expected:
343
+ parts.append(f"expected={expected}")
344
+ if raw_json_snippet:
345
+ parts.append(f"raw_json_start={raw_json_snippet!r}")
346
+ super().__init__(" | ".join(parts))
347
+
348
+
349
+ # ---------------------------------------------------------------------------
350
+ # LLM response parser
351
+ # ---------------------------------------------------------------------------
352
+
353
+
354
  def _parse_llm_response(raw_json: str) -> SnapshotSpec:
355
  """Parse raw JSON from LLM into a validated SnapshotSpec.
356
 
357
+ First parses into LLMSnapshotOutput (which matches the LLM's field names),
358
+ then maps to the canonical SnapshotSpec models. Handles known field-name
359
+ mismatches between the LLM prompt schema and Pydantic models.
360
  """
361
+ raw_snippet = raw_json[:500] if raw_json else ""
362
+
363
+ try:
364
+ data = json.loads(raw_json)
365
+ except json.JSONDecodeError:
366
+ raise
367
+
368
+ logger.debug("_parse_llm_response: parsing %d-char JSON response", len(raw_json))
369
+
370
+ # Parse into intermediate model first for early validation
371
+ try:
372
+ llm_output = LLMSnapshotOutput.model_validate(data)
373
+ except Exception as exc:
374
+ raise SnapshotParseError(
375
+ "Failed to parse LLM output into LLMSnapshotOutput",
376
+ field="root",
377
+ received=type(exc).__name__,
378
+ expected="valid LLMSnapshotOutput JSON",
379
+ raw_json_snippet=raw_snippet,
380
+ ) from exc
381
 
382
  # Map truth_graph vulns
383
  vulns = []
384
+ for i, v in enumerate(llm_output.truth_graph.vulns):
385
+ try:
386
+ vulns.append(
387
+ Vulnerability(
388
+ id=v.id,
389
+ type=v.type,
390
+ host=v.host,
391
+ service=v.service,
392
+ injection_point=v.injection_point,
393
+ vulnerable_code=v.vulnerable_code,
394
+ root_cause=v.root_cause,
395
+ blast_radius=v.blast_radius,
396
+ remediation=v.remediation,
397
+ )
398
  )
399
+ except Exception as exc:
400
+ raise SnapshotParseError(
401
+ f"Failed to map vulnerability at index {i}",
402
+ field=f"truth_graph.vulns[{i}]",
403
+ received=v.model_dump(),
404
+ expected="valid Vulnerability fields",
405
+ raw_json_snippet=raw_snippet,
406
+ ) from exc
407
 
408
  # Map exploit_chain -- LLM uses "vuln"/"action", protocol uses "vuln_id"/"command"
409
  exploit_chain = []
410
+ for i, ec in enumerate(llm_output.truth_graph.exploit_chain):
411
+ vuln_id = ec.vuln_id or ec.vuln
412
+ command = ec.command or ec.action
413
+ description = ec.description or ec.yields
414
+ if vuln_id or command:
415
+ used_fallback = (not ec.vuln_id and ec.vuln) or (not ec.command and ec.action)
416
+ if used_fallback:
417
+ logger.warning(
418
+ "exploit_chain[%d]: used fallback field names (vuln=%r -> vuln_id, action=%r -> command)",
419
+ i,
420
+ ec.vuln,
421
+ ec.action,
422
+ )
423
+ exploit_chain.append(
424
+ ExploitStep(
425
+ vuln_id=vuln_id,
426
+ command=command,
427
+ description=description,
428
+ )
429
  )
 
430
 
431
  truth_graph = TruthGraph(
432
  vulns=vulns,
433
  exploit_chain=exploit_chain,
434
  )
435
 
436
+ # Map golden_path -- LLM uses "cmd"/"expect_stdout", protocol uses "command"/"expect_in_stdout"
437
  golden_path = []
438
+ for i, step in enumerate(llm_output.golden_path):
439
+ command = step.command or step.cmd
440
+ expect = step.expect_in_stdout or step.expect_stdout
441
+ if not command and step.cmd:
442
+ logger.warning(
443
+ "golden_path[%d]: used 'cmd' fallback for 'command'",
444
+ i,
445
+ )
446
+ if not step.expect_in_stdout and step.expect_stdout:
447
+ logger.warning(
448
+ "golden_path[%d]: used 'expect_stdout' fallback for 'expect_in_stdout'",
449
+ i,
450
+ )
451
  golden_path.append(
452
  GoldenPathStep(
453
+ step=step.step,
454
+ command=command,
455
+ expect_in_stdout=expect,
456
+ description=step.description,
 
 
457
  )
458
  )
459
 
460
  # Map flags
461
+ flags = []
462
+ for i, f in enumerate(llm_output.flags):
463
+ try:
464
+ flags.append(
465
+ FlagSpec(
466
+ id=f.id,
467
+ value=f.value,
468
+ path=f.path,
469
+ host=f.host,
470
+ )
471
+ )
472
+ except Exception as exc:
473
+ raise SnapshotParseError(
474
+ f"Failed to map flag at index {i}",
475
+ field=f"flags[{i}]",
476
+ received=f.model_dump(),
477
+ expected="valid FlagSpec (id, value, path, host)",
478
+ raw_json_snippet=raw_snippet,
479
+ ) from exc
480
+
481
+ # Map evidence_spec -- LLM returns dict or list, protocol expects list[EvidenceItem]
482
  evidence_spec: list[EvidenceItem] = []
483
+ evidence_raw = llm_output.evidence_spec
484
  if isinstance(evidence_raw, dict):
485
+ logger.debug("evidence_spec: converting dict format to list[EvidenceItem]")
486
  for key, val in evidence_raw.items():
487
  if isinstance(val, list):
488
  for item in val:
 
500
 
501
  # Map NPC personas
502
  npc_personas = []
503
+ for i, p in enumerate(llm_output.npc_personas):
504
+ try:
505
+ npc_personas.append(
506
+ NPCPersona(
507
+ name=p.name,
508
+ role=p.role,
509
+ department=p.department,
510
+ reports_to=p.reports_to,
511
+ communication_style=p.communication_style,
512
+ security_awareness=p.security_awareness,
513
+ susceptibility=p.susceptibility,
514
+ routine=p.routine,
515
+ accounts=p.accounts,
516
+ )
517
+ )
518
+ except Exception as exc:
519
+ logger.warning(
520
+ "npc_personas[%d]: failed to map persona %r: %s",
521
+ i,
522
+ p.name,
523
+ exc,
524
  )
 
525
 
526
  # Map NPC traffic
527
+ npc_raw = llm_output.npc_traffic
528
  npc_traffic = NPCTrafficSpec(
529
  level=0,
530
  rate_lambda=npc_raw.get("http_rate", 10),
 
532
  )
533
 
534
  # Map task
 
535
  task = TaskSpec(
536
+ red_briefing=llm_output.task.red_briefing,
537
+ blue_briefing=llm_output.task.blue_briefing,
538
  )
539
 
540
  # Map files -- explicit files from LLM + extract from vulnerable_code
541
  files: dict[str, str] = {}
542
 
543
  # 1. Explicit files field from LLM output
544
+ if isinstance(llm_output.files, dict):
545
+ for key, content in llm_output.files.items():
 
546
  if isinstance(content, str):
547
  files[key] = content
548
 
 
561
  if container_key not in files:
562
  files[container_key] = vc
563
 
564
+ logger.debug(
565
+ "_parse_llm_response: mapped %d vulns, %d golden path steps, %d flags, %d files",
566
+ len(vulns),
567
+ len(golden_path),
568
+ len(flags),
569
+ len(files),
570
+ )
571
+
572
  return SnapshotSpec(
573
+ topology=llm_output.topology,
574
  truth_graph=truth_graph,
575
  golden_path=golden_path,
576
  flags=flags,
 
909
  """
910
 
911
  def __init__(self, vuln_pool: list[dict[str, Any]] | None = None) -> None:
912
+ """Initialize with an optional custom vulnerability pool."""
913
  self.vuln_pool = vuln_pool or _DEFAULT_VULN_POOL
914
 
915
  async def build(
 
1046
  scripts=["http_traffic.sh", "db_traffic.sh"],
1047
  )
1048
 
1049
+ logger.info(
1050
+ "TemplateOnlyBuilder: built snapshot with %d vulns (seed=%s)",
1051
+ len(vulns),
1052
+ context.seed,
1053
+ )
1054
+
1055
  return SnapshotSpec(
1056
  topology=topology,
1057
  truth_graph=truth_graph,
 
1077
  """
1078
 
1079
  def __init__(self, snapshot_dir: str = "snapshots") -> None:
1080
+ """Initialize with the directory containing snapshot JSON files."""
1081
  self.snapshot_dir = Path(snapshot_dir)
1082
 
1083
  async def build(
 
1085
  manifest: dict,
1086
  context: BuildContext,
1087
  ) -> SnapshotSpec:
1088
+ """Load a snapshot JSON file, optionally picking by seed."""
1089
  if not self.snapshot_dir.exists():
1090
  raise FileNotFoundError(
1091
  f"Snapshot directory not found: {self.snapshot_dir}"
 
1105
  else:
1106
  chosen = files[0]
1107
 
1108
+ logger.info("FileBuilder: loading snapshot from %s", chosen)
1109
  raw = json.loads(chosen.read_text())
1110
  return _parse_llm_response(json.dumps(raw))
src/open_range/cli.py ADDED
@@ -0,0 +1,438 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """OpenRange CLI -- production command-line interface for the cybersecurity gymnasium.
2
+
3
+ Usage::
4
+
5
+ openrange build -m manifests/tier1_basic.yaml
6
+ openrange render -s snapshots/spec.json -o output/
7
+ openrange validate -s snapshots/spec.json
8
+ openrange deploy -s snapshots/spec.json
9
+ openrange server --port 8000
10
+ """
11
+
12
+ from __future__ import annotations
13
+
14
+ import asyncio
15
+ import json
16
+ import logging
17
+ import os
18
+ import sys
19
+ import time
20
+ from pathlib import Path
21
+ from typing import Any
22
+
23
+ import click
24
+ import yaml
25
+
26
+ # ---------------------------------------------------------------------------
27
+ # Logging setup
28
+ # ---------------------------------------------------------------------------
29
+
30
+ LOG_FORMAT = "%(asctime)s [%(levelname)s] %(name)s: %(message)s"
31
+ LOG_DATE_FORMAT = "%H:%M:%S"
32
+
33
+
34
+ def _configure_logging(verbose: bool) -> None:
35
+ level = logging.DEBUG if verbose else logging.INFO
36
+ logging.basicConfig(
37
+ level=level,
38
+ format=LOG_FORMAT,
39
+ datefmt=LOG_DATE_FORMAT,
40
+ stream=sys.stderr,
41
+ )
42
+ # Quiet noisy third-party loggers unless in verbose mode
43
+ if not verbose:
44
+ for name in ("httpx", "httpcore", "litellm", "urllib3", "docker"):
45
+ logging.getLogger(name).setLevel(logging.WARNING)
46
+
47
+
48
+ # ---------------------------------------------------------------------------
49
+ # Helpers
50
+ # ---------------------------------------------------------------------------
51
+
52
+
53
+ def _run_async(coro: Any) -> Any:
54
+ """Run an async coroutine from synchronous Click context."""
55
+ try:
56
+ loop = asyncio.get_running_loop()
57
+ except RuntimeError:
58
+ loop = None
59
+
60
+ if loop and loop.is_running():
61
+ # Shouldn't happen in a CLI, but be safe.
62
+ import concurrent.futures
63
+
64
+ with concurrent.futures.ThreadPoolExecutor() as pool:
65
+ return pool.submit(asyncio.run, coro).result()
66
+ return asyncio.run(coro)
67
+
68
+
69
+ def _load_manifest(path: str) -> dict[str, Any]:
70
+ """Load and return a YAML manifest as a dict."""
71
+ p = Path(path)
72
+ if not p.exists():
73
+ click.echo(f"Error: manifest not found: {p}", err=True)
74
+ sys.exit(1)
75
+ with open(p) as f:
76
+ data = yaml.safe_load(f)
77
+ if not isinstance(data, dict):
78
+ click.echo(f"Error: manifest must be a YAML mapping, got {type(data).__name__}", err=True)
79
+ sys.exit(1)
80
+ return data
81
+
82
+
83
+ def _load_snapshot(path: str) -> "SnapshotSpec":
84
+ """Load a snapshot JSON file into a SnapshotSpec."""
85
+ from open_range.protocols import SnapshotSpec
86
+
87
+ p = Path(path)
88
+ if not p.exists():
89
+ click.echo(f"Error: snapshot not found: {p}", err=True)
90
+ sys.exit(1)
91
+ with open(p) as f:
92
+ data = json.load(f)
93
+ try:
94
+ return SnapshotSpec.model_validate(data)
95
+ except Exception as exc:
96
+ click.echo(f"Error: invalid snapshot JSON: {exc}", err=True)
97
+ sys.exit(1)
98
+
99
+
100
+ def _write_snapshot(spec: "SnapshotSpec", output_dir: Path) -> Path:
101
+ """Write a SnapshotSpec to spec.json inside output_dir. Returns the file path."""
102
+ output_dir.mkdir(parents=True, exist_ok=True)
103
+ dest = output_dir / "spec.json"
104
+ dest.write_text(json.dumps(spec.model_dump(), indent=2, default=str))
105
+ return dest
106
+
107
+
108
+ # ---------------------------------------------------------------------------
109
+ # CLI group
110
+ # ---------------------------------------------------------------------------
111
+
112
+
113
+ @click.group()
114
+ @click.option("-v", "--verbose", is_flag=True, default=False, help="Enable debug logging.")
115
+ @click.version_option(package_name="openenv-open-range", prog_name="openrange")
116
+ def cli(verbose: bool) -> None:
117
+ """OpenRange -- multi-agent cybersecurity gymnasium.
118
+
119
+ Generate, validate, deploy, and serve Docker-based cyber ranges
120
+ for adversarial Red/Blue agent training.
121
+ """
122
+ _configure_logging(verbose)
123
+
124
+
125
+ # ---------------------------------------------------------------------------
126
+ # build
127
+ # ---------------------------------------------------------------------------
128
+
129
+
130
+ @cli.command()
131
+ @click.option("-m", "--manifest", required=True, type=click.Path(exists=True), help="Path to manifest YAML.")
132
+ @click.option("-o", "--output", default="./snapshots", type=click.Path(), help="Output directory for snapshot.")
133
+ @click.option("--model", default=None, help="LLM model (default: $OPENRANGE_BUILDER_MODEL or azure/gpt-5.2).")
134
+ @click.option("--tier", default=1, type=click.IntRange(1, 5), help="Tier level 1-5.")
135
+ @click.option("--seed", default=None, type=int, help="Random seed for reproducibility.")
136
+ @click.option("--template-only", is_flag=True, default=False, help="Skip LLM, use deterministic template builder.")
137
+ @click.option("--max-tokens", default=16384, type=int, help="Max tokens for LLM generation.")
138
+ def build(
139
+ manifest: str,
140
+ output: str,
141
+ model: str | None,
142
+ tier: int,
143
+ seed: int | None,
144
+ template_only: bool,
145
+ max_tokens: int,
146
+ ) -> None:
147
+ """Generate a snapshot from a manifest YAML.
148
+
149
+ Uses the LLM builder by default. Pass --template-only for a deterministic
150
+ snapshot without any LLM calls (useful for testing).
151
+ """
152
+ from open_range.builder.builder import LLMSnapshotBuilder, TemplateOnlyBuilder
153
+ from open_range.protocols import BuildContext
154
+
155
+ manifest_data = _load_manifest(manifest)
156
+ context = BuildContext(seed=seed, tier=tier)
157
+
158
+ if template_only:
159
+ builder = TemplateOnlyBuilder()
160
+ click.echo(f"Building snapshot (template-only, tier {tier}) ...")
161
+ else:
162
+ resolved_model = model or os.environ.get("OPENRANGE_BUILDER_MODEL", "azure/gpt-5.2")
163
+ builder = LLMSnapshotBuilder(model=resolved_model, max_tokens=max_tokens)
164
+ click.echo(f"Building snapshot (model={resolved_model}, tier {tier}) ...")
165
+
166
+ t0 = time.monotonic()
167
+ try:
168
+ spec = _run_async(builder.build(manifest_data, context))
169
+ except Exception as exc:
170
+ click.echo(f"Error: build failed: {exc}", err=True)
171
+ sys.exit(1)
172
+ elapsed = time.monotonic() - t0
173
+
174
+ output_path = Path(output)
175
+ dest = _write_snapshot(spec, output_path)
176
+
177
+ n_vulns = len(spec.truth_graph.vulns)
178
+ n_steps = len(spec.golden_path)
179
+ n_flags = len(spec.flags)
180
+
181
+ click.echo(f"Snapshot written to {dest}")
182
+ click.echo(f" Vulnerabilities: {n_vulns}")
183
+ click.echo(f" Golden path steps: {n_steps}")
184
+ click.echo(f" Flags: {n_flags}")
185
+ click.echo(f" Elapsed: {elapsed:.1f}s")
186
+
187
+
188
+ # ---------------------------------------------------------------------------
189
+ # render
190
+ # ---------------------------------------------------------------------------
191
+
192
+
193
+ @cli.command()
194
+ @click.option("-s", "--snapshot", required=True, type=click.Path(exists=True), help="Path to snapshot JSON.")
195
+ @click.option("-o", "--output", required=True, type=click.Path(), help="Output directory for Docker artifacts.")
196
+ def render(snapshot: str, output: str) -> None:
197
+ """Render a snapshot JSON into Docker artifacts (Dockerfiles, compose, configs)."""
198
+ from open_range.builder.renderer import SnapshotRenderer
199
+
200
+ spec = _load_snapshot(snapshot)
201
+ renderer = SnapshotRenderer()
202
+ output_path = Path(output)
203
+
204
+ click.echo(f"Rendering snapshot to {output_path} ...")
205
+ try:
206
+ renderer.render(spec, output_path)
207
+ except Exception as exc:
208
+ click.echo(f"Error: render failed: {exc}", err=True)
209
+ sys.exit(1)
210
+
211
+ # List produced files
212
+ if output_path.exists():
213
+ artifacts = sorted(p.name for p in output_path.iterdir() if p.is_file())
214
+ click.echo(f"Produced {len(artifacts)} artifacts:")
215
+ for name in artifacts:
216
+ click.echo(f" {name}")
217
+
218
+
219
+ # ---------------------------------------------------------------------------
220
+ # validate
221
+ # ---------------------------------------------------------------------------
222
+
223
+ # Canonical name -> check class. The order matches the 10-check pipeline.
224
+ _CHECK_REGISTRY: dict[str, str] = {
225
+ "build_boot": "open_range.validator.build_boot.BuildBootCheck",
226
+ "exploitability": "open_range.validator.exploitability.ExploitabilityCheck",
227
+ "patchability": "open_range.validator.patchability.PatchabilityCheck",
228
+ "evidence": "open_range.validator.evidence.EvidenceCheck",
229
+ "reward_grounding": "open_range.validator.reward_grounding.RewardGroundingCheck",
230
+ "isolation": "open_range.validator.isolation.IsolationCheck",
231
+ "task_feasibility": "open_range.validator.task_feasibility.TaskFeasibilityCheck",
232
+ "difficulty": "open_range.validator.difficulty.DifficultyCheck",
233
+ "npc_consistency": "open_range.validator.npc_consistency.NPCConsistencyCheck",
234
+ "realism_review": "open_range.validator.realism_review.RealismReviewCheck",
235
+ }
236
+
237
+ # Checks that require running Docker containers.
238
+ _DOCKER_CHECKS = {"build_boot", "exploitability", "patchability", "evidence"}
239
+
240
+
241
+ def _import_check(dotted: str) -> Any:
242
+ """Import a check class by dotted path."""
243
+ module_path, class_name = dotted.rsplit(".", 1)
244
+ import importlib
245
+
246
+ mod = importlib.import_module(module_path)
247
+ return getattr(mod, class_name)
248
+
249
+
250
+ @cli.command()
251
+ @click.option("-s", "--snapshot", required=True, type=click.Path(exists=True), help="Path to snapshot JSON.")
252
+ @click.option("--checks", default=None, help="Comma-separated check names (default: all applicable).")
253
+ @click.option("--docker/--no-docker", default=False, help="Include Docker-dependent checks (requires running containers).")
254
+ def validate(snapshot: str, checks: str | None, docker: bool) -> None:
255
+ """Run validator checks against a snapshot.
256
+
257
+ By default runs only offline checks (no Docker required). Use --docker
258
+ to include checks that need live containers.
259
+
260
+ Available checks: build_boot, exploitability, patchability, evidence,
261
+ reward_grounding, isolation, task_feasibility, difficulty,
262
+ npc_consistency, realism_review.
263
+ """
264
+ from open_range.protocols import ContainerSet
265
+ from open_range.validator.validator import ValidatorGate
266
+
267
+ spec = _load_snapshot(snapshot)
268
+
269
+ # Determine which checks to run
270
+ if checks:
271
+ names = [n.strip() for n in checks.split(",")]
272
+ unknown = [n for n in names if n not in _CHECK_REGISTRY]
273
+ if unknown:
274
+ click.echo(f"Error: unknown checks: {', '.join(unknown)}", err=True)
275
+ click.echo(f"Available: {', '.join(_CHECK_REGISTRY)}", err=True)
276
+ sys.exit(1)
277
+ else:
278
+ if docker:
279
+ names = list(_CHECK_REGISTRY)
280
+ else:
281
+ names = [n for n in _CHECK_REGISTRY if n not in _DOCKER_CHECKS]
282
+
283
+ if not names:
284
+ click.echo("No checks selected.")
285
+ sys.exit(0)
286
+
287
+ # Instantiate checks
288
+ check_instances = []
289
+ for name in names:
290
+ cls = _import_check(_CHECK_REGISTRY[name])
291
+ check_instances.append(cls())
292
+
293
+ # Containers stub for offline mode, real discovery for docker mode
294
+ containers = ContainerSet()
295
+
296
+ gate = ValidatorGate(check_instances)
297
+ click.echo(f"Running {len(check_instances)} checks ...")
298
+
299
+ result = _run_async(gate.validate(spec, containers))
300
+
301
+ # Print results
302
+ for cr in result.checks:
303
+ status = "PASS" if cr.passed else ("ADVISORY" if cr.advisory else "FAIL")
304
+ line = f" [{status}] {cr.name}"
305
+ if cr.time_s > 0:
306
+ line += f" ({cr.time_s:.2f}s)"
307
+ click.echo(line)
308
+ if cr.error:
309
+ click.echo(f" {cr.error}")
310
+
311
+ click.echo("")
312
+ if result.passed:
313
+ click.echo(f"Validation PASSED ({result.total_time_s:.2f}s)")
314
+ else:
315
+ click.echo(f"Validation FAILED ({result.total_time_s:.2f}s)")
316
+ sys.exit(1)
317
+
318
+
319
+ # ---------------------------------------------------------------------------
320
+ # deploy
321
+ # ---------------------------------------------------------------------------
322
+
323
+
324
+ @cli.command()
325
+ @click.option("-s", "--snapshot", required=True, type=click.Path(exists=True), help="Path to snapshot JSON.")
326
+ @click.option("--compose-dir", default=None, type=click.Path(), help="Directory containing docker-compose.yml (default: render into temp dir).")
327
+ def deploy(snapshot: str, compose_dir: str | None) -> None:
328
+ """Deploy a snapshot to running Docker containers.
329
+
330
+ Renders the snapshot into Docker artifacts and runs docker compose up.
331
+ If --compose-dir is given, uses that directory; otherwise renders into
332
+ a temporary directory alongside the snapshot.
333
+ """
334
+ import subprocess
335
+
336
+ from open_range.builder.renderer import SnapshotRenderer
337
+
338
+ spec = _load_snapshot(snapshot)
339
+
340
+ if compose_dir:
341
+ target = Path(compose_dir)
342
+ else:
343
+ target = Path(snapshot).parent / "deploy"
344
+
345
+ # Render artifacts
346
+ renderer = SnapshotRenderer()
347
+ click.echo(f"Rendering Docker artifacts to {target} ...")
348
+ try:
349
+ renderer.render(spec, target)
350
+ except Exception as exc:
351
+ click.echo(f"Error: render failed: {exc}", err=True)
352
+ sys.exit(1)
353
+
354
+ compose_file = target / "docker-compose.yml"
355
+ if not compose_file.exists():
356
+ click.echo(f"Error: no docker-compose.yml found in {target}", err=True)
357
+ sys.exit(1)
358
+
359
+ click.echo("Starting containers with docker compose ...")
360
+ try:
361
+ proc = subprocess.run(
362
+ ["docker", "compose", "-f", str(compose_file), "up", "-d", "--build"],
363
+ cwd=str(target),
364
+ capture_output=True,
365
+ text=True,
366
+ timeout=300,
367
+ )
368
+ except FileNotFoundError:
369
+ click.echo("Error: docker command not found. Is Docker installed and in PATH?", err=True)
370
+ sys.exit(1)
371
+ except subprocess.TimeoutExpired:
372
+ click.echo("Error: docker compose up timed out after 300s.", err=True)
373
+ sys.exit(1)
374
+
375
+ if proc.returncode != 0:
376
+ click.echo(f"Error: docker compose up failed (exit {proc.returncode}):", err=True)
377
+ if proc.stderr:
378
+ click.echo(proc.stderr, err=True)
379
+ sys.exit(1)
380
+
381
+ click.echo("Containers started.")
382
+
383
+ # Show running container status
384
+ try:
385
+ ps = subprocess.run(
386
+ ["docker", "compose", "-f", str(compose_file), "ps", "--format", "table"],
387
+ cwd=str(target),
388
+ capture_output=True,
389
+ text=True,
390
+ timeout=30,
391
+ )
392
+ if ps.stdout:
393
+ click.echo(ps.stdout)
394
+ except Exception:
395
+ pass # Non-critical
396
+
397
+
398
+ # ---------------------------------------------------------------------------
399
+ # server
400
+ # ---------------------------------------------------------------------------
401
+
402
+
403
+ @cli.command()
404
+ @click.option("--host", default="0.0.0.0", help="Host to bind.")
405
+ @click.option("--port", default=8000, type=int, help="Port to listen on.")
406
+ @click.option("--mock/--no-mock", default=False, help="Use mock mode (no Docker required).")
407
+ def server(host: str, port: int, mock: bool) -> None:
408
+ """Start the OpenEnv server.
409
+
410
+ In mock mode, the environment simulates container interactions without
411
+ requiring a running Docker stack.
412
+ """
413
+ import uvicorn
414
+
415
+ if mock:
416
+ os.environ["OPENRANGE_MOCK"] = "1"
417
+ click.echo(f"Starting OpenRange server in MOCK mode on {host}:{port} ...")
418
+ else:
419
+ click.echo(f"Starting OpenRange server on {host}:{port} ...")
420
+
421
+ try:
422
+ uvicorn.run(
423
+ "open_range.server.app:app",
424
+ host=host,
425
+ port=port,
426
+ log_level="info",
427
+ )
428
+ except Exception as exc:
429
+ click.echo(f"Error: server failed: {exc}", err=True)
430
+ sys.exit(1)
431
+
432
+
433
+ # ---------------------------------------------------------------------------
434
+ # Entry point
435
+ # ---------------------------------------------------------------------------
436
+
437
+ if __name__ == "__main__":
438
+ cli()
src/open_range/client/client.py CHANGED
@@ -1,9 +1,36 @@
1
- """Typed OpenEnv client for OpenRange."""
 
 
 
2
 
3
  from __future__ import annotations
4
 
5
- from openenv.core.client_types import StepResult
6
- from openenv.core.env_client import EnvClient
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
 
8
  from open_range.server.models import RangeAction, RangeObservation, RangeState
9
 
 
1
+ """Typed OpenEnv client for OpenRange.
2
+
3
+ Falls back to lightweight stubs if openenv is not installed.
4
+ """
5
 
6
  from __future__ import annotations
7
 
8
+ from typing import Any, Generic, TypeVar
9
+
10
+ try:
11
+ from openenv.core.client_types import StepResult
12
+ from openenv.core.env_client import EnvClient
13
+ except ImportError:
14
+ from dataclasses import dataclass, field
15
+
16
+ _A = TypeVar("_A")
17
+ _O = TypeVar("_O")
18
+ _S = TypeVar("_S")
19
+
20
+ @dataclass
21
+ class StepResult(Generic[_O]): # type: ignore[no-redef]
22
+ """Minimal stub matching openenv.core.client_types.StepResult."""
23
+
24
+ observation: Any = None
25
+ reward: float | int | None = None
26
+ done: bool = False
27
+ metadata: dict[str, Any] = field(default_factory=dict)
28
+
29
+ class EnvClient(Generic[_A, _O, _S]): # type: ignore[no-redef]
30
+ """Minimal stub matching openenv.core.env_client.EnvClient."""
31
+
32
+ def __init__(self, *args: Any, **kwargs: Any) -> None:
33
+ pass
34
 
35
  from open_range.server.models import RangeAction, RangeObservation, RangeState
36
 
src/open_range/server/Dockerfile DELETED
@@ -1,44 +0,0 @@
1
- FROM python:3.11-slim AS builder
2
-
3
- WORKDIR /app
4
-
5
- # Install uv for fast dependency resolution
6
- RUN pip install --no-cache-dir uv
7
-
8
- # Copy project files
9
- COPY pyproject.toml uv.lock* ./
10
- COPY src/ src/
11
- COPY openenv.yaml .
12
- COPY manifests/ manifests/
13
-
14
- # Install dependencies
15
- RUN uv sync --frozen --no-editable 2>/dev/null || uv sync --no-editable
16
-
17
- # --- Runtime stage ---
18
- FROM python:3.11-slim
19
-
20
- WORKDIR /app
21
-
22
- # Runtime system deps: Docker CLI (for controlling range containers) + curl
23
- RUN apt-get update && \
24
- apt-get install -y --no-install-recommends \
25
- docker.io \
26
- curl \
27
- && rm -rf /var/lib/apt/lists/*
28
-
29
- COPY --from=builder /app/.venv /app/.venv
30
- COPY --from=builder /app/src /app/src
31
- COPY --from=builder /app/pyproject.toml /app/pyproject.toml
32
- COPY --from=builder /app/openenv.yaml /app/openenv.yaml
33
- COPY --from=builder /app/manifests /app/manifests
34
- COPY server/ server/
35
-
36
- ENV PATH="/app/.venv/bin:$PATH"
37
- ENV PYTHONPATH="/app/src:$PYTHONPATH"
38
-
39
- EXPOSE 8000
40
-
41
- HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
42
- CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')" || exit 1
43
-
44
- CMD ["uvicorn", "server.app:app", "--host", "0.0.0.0", "--port", "8000"]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
src/open_range/server/app.py CHANGED
@@ -39,3 +39,6 @@ def main() -> None:
39
 
40
 
41
  app = create_app()
 
 
 
 
39
 
40
 
41
  app = create_app()
42
+
43
+ if __name__ == "__main__":
44
+ main()
src/open_range/server/environment.py CHANGED
@@ -16,6 +16,7 @@ Design:
16
  from __future__ import annotations
17
 
18
  import logging
 
19
  import time
20
  from typing import TYPE_CHECKING, Any
21
  from uuid import uuid4
@@ -248,11 +249,11 @@ class RangeEnvironment(_BASE): # type: ignore[misc]
248
 
249
  parent_dir = path.rsplit("/", 1)[0] if "/" in path else "/"
250
  self._exec_in_container(
251
- container_name, f"mkdir -p '{parent_dir}'"
252
  )
253
 
254
  b64 = base64.b64encode(content.encode()).decode()
255
- cmd = f"echo '{b64}' | base64 -d > '{path}'"
256
  _, stderr = self._exec_in_container(container_name, cmd)
257
  if stderr and "Error" in stderr:
258
  logger.warning(
@@ -284,32 +285,41 @@ class RangeEnvironment(_BASE): # type: ignore[misc]
284
  """
285
  if "snapshot" in kwargs and isinstance(kwargs["snapshot"], SnapshotSpec):
286
  self._snapshot_id = kwargs.get("snapshot_id")
287
- return kwargs["snapshot"]
288
-
289
- if self._runtime is not None:
290
  if "snapshot_id" in kwargs and kwargs["snapshot_id"]:
291
  admitted = self._runtime.get_snapshot(str(kwargs["snapshot_id"]))
292
  else:
293
  admitted = self._runtime.acquire_snapshot()
294
  self._snapshot_id = admitted.snapshot_id
295
- return admitted.snapshot
296
-
297
- self._snapshot_id = None
298
- return SnapshotSpec(
299
- topology={"hosts": []},
300
- flags=[],
301
- golden_path=[],
302
- task={
303
- "red_briefing": (
304
- "Target network detected. Begin reconnaissance and "
305
- "identify vulnerabilities. Capture all flags."
306
- ),
307
- "blue_briefing": (
308
- "Monitor SIEM for suspicious activity. Investigate "
309
- "alerts, patch vulnerabilities, and report findings."
310
- ),
311
- },
312
- )
 
 
 
 
 
 
 
 
 
 
313
 
314
  # -----------------------------------------------------------------
315
  # Special command handling
@@ -328,13 +338,13 @@ class RangeEnvironment(_BASE): # type: ignore[misc]
328
  done=True,
329
  )
330
 
331
- valid_flags = {f.value for f in self._snapshot.flags}
332
  already_found = set(self._state.flags_found)
333
 
334
  if submitted in valid_flags and submitted not in already_found:
335
  self._state.flags_found.append(submitted)
336
  # Check if all flags captured
337
- all_captured = set(self._state.flags_found) == valid_flags
338
  return RangeObservation(
339
  stdout=f"Correct! Flag accepted: {submitted}",
340
  flags_captured=[submitted],
@@ -395,7 +405,7 @@ class RangeEnvironment(_BASE): # type: ignore[misc]
395
  Checks credentials against the topology user list in the snapshot.
396
  Successful auth is recorded in ``state.active_sessions``.
397
  """
398
- parts = action.command.strip().split()
399
  if len(parts) < 4:
400
  return RangeObservation(
401
  stdout="",
@@ -615,8 +625,8 @@ class RangeEnvironment(_BASE): # type: ignore[misc]
615
  "Episode %s reset: tier=%d, flags=%d, golden_path_steps=%d",
616
  eid,
617
  self._state.tier,
618
- len(self._snapshot.flags),
619
- len(self._snapshot.golden_path),
620
  )
621
 
622
  return RangeObservation(stdout=briefing)
@@ -774,7 +784,7 @@ class RangeEnvironment(_BASE): # type: ignore[misc]
774
  action, obs, self._state, self._snapshot, reward_ctx
775
  )
776
  except Exception as exc:
777
- logger.warning("Reward computation failed: %s", exc)
778
  obs.reward = 0.0
779
 
780
  return obs
 
16
  from __future__ import annotations
17
 
18
  import logging
19
+ import shlex
20
  import time
21
  from typing import TYPE_CHECKING, Any
22
  from uuid import uuid4
 
249
 
250
  parent_dir = path.rsplit("/", 1)[0] if "/" in path else "/"
251
  self._exec_in_container(
252
+ container_name, f"mkdir -p {shlex.quote(parent_dir)}"
253
  )
254
 
255
  b64 = base64.b64encode(content.encode()).decode()
256
+ cmd = f"echo '{b64}' | base64 -d > {shlex.quote(path)}"
257
  _, stderr = self._exec_in_container(container_name, cmd)
258
  if stderr and "Error" in stderr:
259
  logger.warning(
 
285
  """
286
  if "snapshot" in kwargs and isinstance(kwargs["snapshot"], SnapshotSpec):
287
  self._snapshot_id = kwargs.get("snapshot_id")
288
+ snap = kwargs["snapshot"]
289
+ elif self._runtime is not None:
 
290
  if "snapshot_id" in kwargs and kwargs["snapshot_id"]:
291
  admitted = self._runtime.get_snapshot(str(kwargs["snapshot_id"]))
292
  else:
293
  admitted = self._runtime.acquire_snapshot()
294
  self._snapshot_id = admitted.snapshot_id
295
+ snap = admitted.snapshot
296
+ else:
297
+ self._snapshot_id = None
298
+ snap = SnapshotSpec(
299
+ topology={"hosts": []},
300
+ flags=[],
301
+ golden_path=[],
302
+ task={
303
+ "red_briefing": (
304
+ "Target network detected. Begin reconnaissance and "
305
+ "identify vulnerabilities. Capture all flags."
306
+ ),
307
+ "blue_briefing": (
308
+ "Monitor SIEM for suspicious activity. Investigate "
309
+ "alerts, patch vulnerabilities, and report findings."
310
+ ),
311
+ },
312
+ )
313
+
314
+ # Defensive: ensure required fields are not None
315
+ if snap.flags is None:
316
+ snap.flags = []
317
+ if snap.topology is None:
318
+ snap.topology = {}
319
+ if snap.task is None:
320
+ snap.task = {}
321
+
322
+ return snap
323
 
324
  # -----------------------------------------------------------------
325
  # Special command handling
 
338
  done=True,
339
  )
340
 
341
+ valid_flags = {f.value for f in self._snapshot.flags} if self._snapshot.flags else set()
342
  already_found = set(self._state.flags_found)
343
 
344
  if submitted in valid_flags and submitted not in already_found:
345
  self._state.flags_found.append(submitted)
346
  # Check if all flags captured
347
+ all_captured = valid_flags and set(self._state.flags_found) == valid_flags
348
  return RangeObservation(
349
  stdout=f"Correct! Flag accepted: {submitted}",
350
  flags_captured=[submitted],
 
405
  Checks credentials against the topology user list in the snapshot.
406
  Successful auth is recorded in ``state.active_sessions``.
407
  """
408
+ parts = action.command.strip().split(maxsplit=3)
409
  if len(parts) < 4:
410
  return RangeObservation(
411
  stdout="",
 
625
  "Episode %s reset: tier=%d, flags=%d, golden_path_steps=%d",
626
  eid,
627
  self._state.tier,
628
+ len(self._snapshot.flags or []),
629
+ len(self._snapshot.golden_path or []),
630
  )
631
 
632
  return RangeObservation(stdout=briefing)
 
784
  action, obs, self._state, self._snapshot, reward_ctx
785
  )
786
  except Exception as exc:
787
+ logger.error("Reward computation failed: %s", exc, exc_info=True)
788
  obs.reward = 0.0
789
 
790
  return obs
src/open_range/training/rollout.py CHANGED
@@ -14,7 +14,7 @@ Usage with GRPOTrainer::
14
 
15
  from __future__ import annotations
16
 
17
- from typing import Any, Callable, Protocol
18
 
19
 
20
  class AgentCallable(Protocol):
@@ -23,7 +23,7 @@ class AgentCallable(Protocol):
23
  def __call__(self, observation: Any) -> Any: ...
24
 
25
 
26
- async def rollout_func(
27
  env: Any,
28
  agent: AgentCallable,
29
  num_steps: int = 100,
@@ -82,10 +82,8 @@ def rollout_func_sync(
82
  num_steps: int = 100,
83
  mode: str = "red",
84
  ) -> dict[str, Any]:
85
- """Synchronous wrapper around the async rollout function.
86
 
87
- For use with training loops that don't support async.
88
  """
89
- import asyncio
90
-
91
- return asyncio.run(rollout_func(env, agent, num_steps, mode))
 
14
 
15
  from __future__ import annotations
16
 
17
+ from typing import Any, Protocol
18
 
19
 
20
  class AgentCallable(Protocol):
 
23
  def __call__(self, observation: Any) -> Any: ...
24
 
25
 
26
+ def rollout_func(
27
  env: Any,
28
  agent: AgentCallable,
29
  num_steps: int = 100,
 
82
  num_steps: int = 100,
83
  mode: str = "red",
84
  ) -> dict[str, Any]:
85
+ """Synchronous wrapper now just delegates to rollout_func directly.
86
 
87
+ Kept for backward compatibility with callers that import this name.
88
  """
89
+ return rollout_func(env, agent, num_steps, mode)
 
 
tests/test_apply_snapshot.py ADDED
@@ -0,0 +1,457 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Tests for RangeEnvironment._apply_snapshot() with mocked Docker.
2
+
3
+ Covers file deployment via docker exec (base64 encoding), SQL execution,
4
+ container name resolution, error handling, and mixed files dicts.
5
+ """
6
+
7
+ from __future__ import annotations
8
+
9
+ import base64
10
+ from unittest.mock import MagicMock, call, patch
11
+
12
+ import pytest
13
+
14
+ from open_range.protocols import (
15
+ FlagSpec,
16
+ SnapshotSpec,
17
+ TruthGraph,
18
+ Vulnerability,
19
+ )
20
+ from open_range.server.environment import RangeEnvironment
21
+
22
+
23
+ # ---------------------------------------------------------------------------
24
+ # Helpers
25
+ # ---------------------------------------------------------------------------
26
+
27
+
28
+ def _make_env(docker_available: bool = True) -> RangeEnvironment:
29
+ """Create a RangeEnvironment with docker_available control."""
30
+ return RangeEnvironment(docker_available=docker_available)
31
+
32
+
33
+ def _make_snapshot(files: dict[str, str] | None = None) -> SnapshotSpec:
34
+ """Create a minimal SnapshotSpec with the given files dict."""
35
+ return SnapshotSpec(
36
+ topology={"hosts": ["web", "db"], "zones": {"dmz": ["web"], "internal": ["db"]}},
37
+ truth_graph=TruthGraph(vulns=[]),
38
+ flags=[],
39
+ golden_path=[],
40
+ files=files or {},
41
+ )
42
+
43
+
44
+ class _FakeExecResult:
45
+ """Mimics docker SDK exec_run return value."""
46
+
47
+ def __init__(self, stdout: bytes = b"", stderr: bytes = b""):
48
+ self.output = (stdout, stderr)
49
+
50
+
51
+ class _FakeContainer:
52
+ """Minimal fake Docker container."""
53
+
54
+ def __init__(self, name: str, exec_side_effect=None):
55
+ self.name = name
56
+ self._exec_side_effect = exec_side_effect or (lambda *a, **kw: _FakeExecResult())
57
+
58
+ def exec_run(self, cmd, **kwargs):
59
+ return self._exec_side_effect(cmd, **kwargs)
60
+
61
+
62
+ class _FakeDockerClient:
63
+ """Minimal fake Docker client."""
64
+
65
+ def __init__(self, containers: dict[str, _FakeContainer] | None = None):
66
+ self._containers = containers or {}
67
+
68
+ @property
69
+ def containers(self):
70
+ return self
71
+
72
+ def get(self, name: str):
73
+ if name in self._containers:
74
+ return self._containers[name]
75
+ raise Exception(f"Container {name} not found")
76
+
77
+ def list(self):
78
+ return list(self._containers.values())
79
+
80
+
81
+ # ---------------------------------------------------------------------------
82
+ # Tests: Docker unavailable
83
+ # ---------------------------------------------------------------------------
84
+
85
+
86
+ class TestApplySnapshotNoDocker:
87
+ """When Docker is not available, _apply_snapshot should be a no-op."""
88
+
89
+ def test_skips_when_docker_unavailable(self):
90
+ env = _make_env(docker_available=False)
91
+ snapshot = _make_snapshot({"web:/var/www/test.php": "<?php echo 1; ?>"})
92
+ # Should not raise
93
+ env._apply_snapshot(snapshot)
94
+
95
+ def test_skips_when_no_files(self):
96
+ env = _make_env(docker_available=False)
97
+ snapshot = _make_snapshot({})
98
+ env._apply_snapshot(snapshot)
99
+
100
+ def test_skips_when_files_is_none(self):
101
+ env = _make_env(docker_available=False)
102
+ snapshot = _make_snapshot()
103
+ snapshot.files = {}
104
+ env._apply_snapshot(snapshot)
105
+
106
+
107
+ # ---------------------------------------------------------------------------
108
+ # Tests: File deployment via base64
109
+ # ---------------------------------------------------------------------------
110
+
111
+
112
+ class TestFileDeployment:
113
+ """Verify files are deployed to containers via base64-encoded docker exec."""
114
+
115
+ def test_deploys_single_file(self):
116
+ env = _make_env(docker_available=True)
117
+ content = "<?php echo 'hello'; ?>"
118
+ snapshot = _make_snapshot({"web:/var/www/portal/test.php": content})
119
+
120
+ exec_calls = []
121
+
122
+ def fake_exec_run(cmd, **kw):
123
+ exec_calls.append(cmd)
124
+ return _FakeExecResult()
125
+
126
+ container = _FakeContainer("web", exec_side_effect=fake_exec_run)
127
+ client = _FakeDockerClient({"web": container})
128
+ env._docker_client = client
129
+ env._docker_available = True
130
+
131
+ env._apply_snapshot(snapshot)
132
+
133
+ # Should have 2 calls: mkdir -p, then echo base64 | base64 -d > path
134
+ assert len(exec_calls) == 2
135
+ # First call: mkdir -p for parent directory
136
+ mkdir_cmd = exec_calls[0]
137
+ assert mkdir_cmd == ["sh", "-c", "mkdir -p '/var/www/portal'"]
138
+ # Second call: base64 write
139
+ write_cmd = exec_calls[1]
140
+ assert isinstance(write_cmd, list)
141
+ write_str = write_cmd[2] if len(write_cmd) > 2 else ""
142
+ expected_b64 = base64.b64encode(content.encode()).decode()
143
+ assert expected_b64 in write_str
144
+ assert "/var/www/portal/test.php" in write_str
145
+
146
+ def test_deploys_multiple_files_to_different_containers(self):
147
+ env = _make_env(docker_available=True)
148
+ snapshot = _make_snapshot({
149
+ "web:/var/www/portal/index.php": "<?php echo 'web'; ?>",
150
+ "files:/srv/shares/general/notes.txt": "some notes",
151
+ })
152
+
153
+ web_calls = []
154
+ files_calls = []
155
+
156
+ web = _FakeContainer(
157
+ "web",
158
+ exec_side_effect=lambda cmd, **kw: (web_calls.append(cmd), _FakeExecResult())[1],
159
+ )
160
+ files_container = _FakeContainer(
161
+ "files",
162
+ exec_side_effect=lambda cmd, **kw: (files_calls.append(cmd), _FakeExecResult())[1],
163
+ )
164
+ client = _FakeDockerClient({"web": web, "files": files_container})
165
+ env._docker_client = client
166
+ env._docker_available = True
167
+
168
+ env._apply_snapshot(snapshot)
169
+
170
+ # web: 2 calls (mkdir + write)
171
+ assert len(web_calls) == 2
172
+ # files: 2 calls (mkdir + write)
173
+ assert len(files_calls) == 2
174
+
175
+ def test_file_at_root_path(self):
176
+ """File at / should still work (edge case for parent dir)."""
177
+ env = _make_env(docker_available=True)
178
+ snapshot = _make_snapshot({"web:/test.txt": "root file"})
179
+
180
+ calls = []
181
+ container = _FakeContainer(
182
+ "web",
183
+ exec_side_effect=lambda cmd, **kw: (calls.append(cmd), _FakeExecResult())[1],
184
+ )
185
+ client = _FakeDockerClient({"web": container})
186
+ env._docker_client = client
187
+ env._docker_available = True
188
+
189
+ env._apply_snapshot(snapshot)
190
+
191
+ # mkdir -p for "/" then base64 write
192
+ assert len(calls) == 2
193
+
194
+
195
+ # ---------------------------------------------------------------------------
196
+ # Tests: SQL execution via docker exec
197
+ # ---------------------------------------------------------------------------
198
+
199
+
200
+ class TestSQLDeployment:
201
+ """Verify db:sql entries are deployed via mysql commands."""
202
+
203
+ def test_deploys_sql_to_db_container(self):
204
+ env = _make_env(docker_available=True)
205
+ sql = "INSERT INTO users VALUES (1, 'test');"
206
+ snapshot = _make_snapshot({"db:sql": sql})
207
+
208
+ calls = []
209
+
210
+ def fake_exec(cmd, **kw):
211
+ calls.append(cmd)
212
+ return _FakeExecResult()
213
+
214
+ db_container = _FakeContainer("db", exec_side_effect=fake_exec)
215
+ client = _FakeDockerClient({"db": db_container})
216
+ env._docker_client = client
217
+ env._docker_available = True
218
+
219
+ env._apply_snapshot(snapshot)
220
+
221
+ # 3 calls: write SQL file, execute mysql, cleanup
222
+ assert len(calls) == 3
223
+
224
+ # First: base64 decode to /tmp/_snapshot.sql
225
+ write_cmd_str = calls[0][2] if len(calls[0]) > 2 else ""
226
+ expected_b64 = base64.b64encode(sql.encode()).decode()
227
+ assert expected_b64 in write_cmd_str
228
+ assert "/tmp/_snapshot.sql" in write_cmd_str
229
+
230
+ # Second: mysql < /tmp/_snapshot.sql
231
+ mysql_cmd_str = calls[1][2] if len(calls[1]) > 2 else ""
232
+ assert "mysql" in mysql_cmd_str
233
+ assert "/tmp/_snapshot.sql" in mysql_cmd_str
234
+
235
+ # Third: rm -f /tmp/_snapshot.sql
236
+ rm_cmd_str = calls[2][2] if len(calls[2]) > 2 else ""
237
+ assert "rm" in rm_cmd_str
238
+ assert "/tmp/_snapshot.sql" in rm_cmd_str
239
+
240
+ def test_sql_error_logs_warning(self, caplog):
241
+ """When mysql returns an ERROR, it should log a warning but not raise."""
242
+ env = _make_env(docker_available=True)
243
+ snapshot = _make_snapshot({"db:sql": "INVALID SQL;"})
244
+
245
+ call_count = [0]
246
+
247
+ def fake_exec(cmd, **kw):
248
+ call_count[0] += 1
249
+ # Return ERROR on the mysql command (2nd call)
250
+ if call_count[0] == 2:
251
+ return _FakeExecResult(stderr=b"ERROR 1064: Syntax error")
252
+ return _FakeExecResult()
253
+
254
+ db_container = _FakeContainer("db", exec_side_effect=fake_exec)
255
+ client = _FakeDockerClient({"db": db_container})
256
+ env._docker_client = client
257
+ env._docker_available = True
258
+
259
+ import logging
260
+ with caplog.at_level(logging.WARNING):
261
+ env._apply_snapshot(snapshot)
262
+
263
+ assert any("SQL deployment error" in r.message for r in caplog.records)
264
+
265
+
266
+ # ---------------------------------------------------------------------------
267
+ # Tests: Container name resolution
268
+ # ---------------------------------------------------------------------------
269
+
270
+
271
+ class TestContainerNameResolution:
272
+ """Verify _container_name resolves hosts correctly."""
273
+
274
+ def test_resolves_via_compose_config(self):
275
+ env = _make_env(docker_available=False)
276
+ env._snapshot = SnapshotSpec(
277
+ topology={},
278
+ compose={
279
+ "services": {"web": {}, "db": {}},
280
+ "x-project-name": "openrange",
281
+ },
282
+ )
283
+ assert env._container_name("web") == "openrange-web-1"
284
+ assert env._container_name("db") == "openrange-db-1"
285
+
286
+ def test_resolves_via_docker_listing(self):
287
+ env = _make_env(docker_available=True)
288
+ env._snapshot = None # No compose config
289
+
290
+ web_container = MagicMock()
291
+ web_container.name = "open-range-web-1"
292
+ db_container = MagicMock()
293
+ db_container.name = "open-range-db-1"
294
+
295
+ client = MagicMock()
296
+ client.containers.list.return_value = [web_container, db_container]
297
+ env._docker_client = client
298
+
299
+ assert env._container_name("web") == "open-range-web-1"
300
+ assert env._container_name("db") == "open-range-db-1"
301
+
302
+ def test_falls_back_to_bare_name(self):
303
+ env = _make_env(docker_available=False)
304
+ env._snapshot = None
305
+ assert env._container_name("web") == "web"
306
+
307
+
308
+ # ---------------------------------------------------------------------------
309
+ # Tests: Error handling for failed docker exec
310
+ # ---------------------------------------------------------------------------
311
+
312
+
313
+ class TestErrorHandling:
314
+ """Verify graceful handling of docker exec failures."""
315
+
316
+ def test_file_deployment_handles_exception(self, caplog):
317
+ """If docker exec raises, log warning but continue."""
318
+ env = _make_env(docker_available=True)
319
+ snapshot = _make_snapshot({
320
+ "web:/var/www/good.php": "good",
321
+ "broken:/var/www/fail.php": "bad",
322
+ })
323
+
324
+ def fake_exec(cmd, **kw):
325
+ return _FakeExecResult()
326
+
327
+ web = _FakeContainer("web", exec_side_effect=fake_exec)
328
+ # 'broken' container doesn't exist
329
+ client = _FakeDockerClient({"web": web})
330
+ env._docker_client = client
331
+ env._docker_available = True
332
+
333
+ import logging
334
+ with caplog.at_level(logging.WARNING):
335
+ env._apply_snapshot(snapshot)
336
+
337
+ # Should deploy the good file and warn about the broken one
338
+ assert any("Failed to deploy" in r.message or "broken" in r.message
339
+ for r in caplog.records)
340
+
341
+ def test_bad_key_format_skipped(self, caplog):
342
+ """Keys without ':' separator should be skipped with a warning."""
343
+ env = _make_env(docker_available=True)
344
+ snapshot = _make_snapshot({
345
+ "no_colon_here": "this should be skipped",
346
+ "web:/var/www/valid.php": "valid content",
347
+ })
348
+
349
+ calls = []
350
+ web = _FakeContainer(
351
+ "web",
352
+ exec_side_effect=lambda cmd, **kw: (calls.append(cmd), _FakeExecResult())[1],
353
+ )
354
+ client = _FakeDockerClient({"web": web})
355
+ env._docker_client = client
356
+ env._docker_available = True
357
+
358
+ import logging
359
+ with caplog.at_level(logging.WARNING):
360
+ env._apply_snapshot(snapshot)
361
+
362
+ assert any("bad key format" in r.message for r in caplog.records)
363
+ # Only valid file should be deployed (mkdir + write = 2 calls)
364
+ assert len(calls) == 2
365
+
366
+ def test_file_write_stderr_error_logged(self, caplog):
367
+ """If file write returns stderr with 'Error', log warning."""
368
+ env = _make_env(docker_available=True)
369
+ snapshot = _make_snapshot({"web:/var/www/fail.php": "content"})
370
+
371
+ call_count = [0]
372
+
373
+ def fake_exec(cmd, **kw):
374
+ call_count[0] += 1
375
+ # Return error on the write call (2nd call)
376
+ if call_count[0] == 2:
377
+ return _FakeExecResult(stderr=b"Error: permission denied")
378
+ return _FakeExecResult()
379
+
380
+ web = _FakeContainer("web", exec_side_effect=fake_exec)
381
+ client = _FakeDockerClient({"web": web})
382
+ env._docker_client = client
383
+ env._docker_available = True
384
+
385
+ import logging
386
+ with caplog.at_level(logging.WARNING):
387
+ env._apply_snapshot(snapshot)
388
+
389
+ assert any("File deployment error" in r.message for r in caplog.records)
390
+
391
+
392
+ # ---------------------------------------------------------------------------
393
+ # Tests: Mixed files dict (file paths + db:sql entries)
394
+ # ---------------------------------------------------------------------------
395
+
396
+
397
+ class TestMixedFilesDict:
398
+ """Test snapshot with both regular file deployments and db:sql entries."""
399
+
400
+ def test_mixed_deployment(self):
401
+ env = _make_env(docker_available=True)
402
+ snapshot = _make_snapshot({
403
+ "web:/var/www/portal/index.php": "<?php echo 'hello'; ?>",
404
+ "web:/etc/nginx/sites-available/default": "server { listen 80; }",
405
+ "db:sql": "INSERT INTO secrets VALUES ('flag', 'FLAG{test}');",
406
+ "files:/srv/shares/general/notes.txt": "meeting notes",
407
+ })
408
+
409
+ container_calls: dict[str, list] = {"web": [], "db": [], "files": []}
410
+
411
+ def make_exec(name):
412
+ def fake_exec(cmd, **kw):
413
+ container_calls[name].append(cmd)
414
+ return _FakeExecResult()
415
+ return fake_exec
416
+
417
+ containers = {
418
+ name: _FakeContainer(name, exec_side_effect=make_exec(name))
419
+ for name in ["web", "db", "files"]
420
+ }
421
+ client = _FakeDockerClient(containers)
422
+ env._docker_client = client
423
+ env._docker_available = True
424
+
425
+ env._apply_snapshot(snapshot)
426
+
427
+ # web: 2 files * 2 calls each = 4
428
+ assert len(container_calls["web"]) == 4
429
+ # db: 3 calls (write sql, execute, cleanup)
430
+ assert len(container_calls["db"]) == 3
431
+ # files: 1 file * 2 calls = 2
432
+ assert len(container_calls["files"]) == 2
433
+
434
+ def test_deployment_count_in_log(self, caplog):
435
+ """Verify the final log message reports correct deployment counts."""
436
+ env = _make_env(docker_available=True)
437
+ snapshot = _make_snapshot({
438
+ "web:/var/www/test.php": "test",
439
+ "db:sql": "SELECT 1;",
440
+ })
441
+
442
+ def fake_exec(cmd, **kw):
443
+ return _FakeExecResult()
444
+
445
+ containers = {
446
+ name: _FakeContainer(name, exec_side_effect=fake_exec)
447
+ for name in ["web", "db"]
448
+ }
449
+ client = _FakeDockerClient(containers)
450
+ env._docker_client = client
451
+ env._docker_available = True
452
+
453
+ import logging
454
+ with caplog.at_level(logging.INFO):
455
+ env._apply_snapshot(snapshot)
456
+
457
+ assert any("2/2 artifacts deployed" in r.message for r in caplog.records)
tests/test_console.py CHANGED
@@ -1,7 +1,11 @@
1
  """Tests for the operator debugging console (issue #28).
2
 
3
- Uses Starlette's TestClient against the standalone FastAPI app.
4
  No Docker dependency.
 
 
 
 
5
  """
6
 
7
  from __future__ import annotations
@@ -10,17 +14,27 @@ import pytest
10
  from starlette.testclient import TestClient
11
 
12
  from open_range.server.app import create_app
 
 
13
 
14
 
15
  @pytest.fixture()
16
- def client(monkeypatch):
17
- """Create a TestClient against the standalone FastAPI app (not OpenEnv)."""
18
- # Force standalone path so we test our own endpoints and console integration
19
- monkeypatch.setattr("open_range.server.app._try_openenv_app", lambda: None)
20
  app = create_app()
 
 
 
 
21
  return TestClient(app)
22
 
23
 
 
 
 
 
 
 
24
  # ===================================================================
25
  # GET /console -- HTML page
26
  # ===================================================================
@@ -59,8 +73,8 @@ class TestSnapshotAPI:
59
  data = client.get("/console/api/snapshot").json()
60
  assert data["id"] is None
61
 
62
- def test_snapshot_after_reset(self, client: TestClient):
63
- client.post("/reset", json={"episode_id": "snap_test_1"})
64
  data = client.get("/console/api/snapshot").json()
65
  assert data["id"] == "snap_test_1"
66
  assert "hosts" in data
@@ -68,9 +82,9 @@ class TestSnapshotAPI:
68
  assert "vuln_count" in data
69
  assert "tier" in data
70
 
71
- def test_snapshot_no_truth_graph_or_flags(self, client: TestClient):
72
  """Snapshot API must NOT leak truth_graph or flag values."""
73
- client.post("/reset", json={})
74
  data = client.get("/console/api/snapshot").json()
75
  assert "truth_graph" not in data
76
  assert "flags" not in data
@@ -89,20 +103,22 @@ class TestEpisodeAPI:
89
  data = resp.json()
90
  assert isinstance(data, dict)
91
 
92
- def test_episode_fields(self, client: TestClient):
93
- client.post("/reset", json={})
94
  data = client.get("/console/api/episode").json()
95
  assert "step_count" in data
96
  assert "flags_found" in data
97
  assert "mode" in data
98
  assert "services_status" in data
99
 
100
- def test_episode_step_count_updates(self, client: TestClient):
101
- client.post("/reset", json={})
 
 
102
  data = client.get("/console/api/episode").json()
103
  assert data["step_count"] == 0
104
 
105
- client.post("/step", json={"command": "nmap web", "mode": "red"})
106
  data = client.get("/console/api/episode").json()
107
  assert data["step_count"] == 1
108
 
@@ -120,15 +136,14 @@ class TestHistoryAPI:
120
  assert isinstance(data, list)
121
 
122
  def test_history_empty_initially(self, client: TestClient):
123
- # Reset clears history
124
- client.post("/reset", json={})
125
  data = client.get("/console/api/history").json()
126
  assert data == []
127
 
128
  def test_history_records_actions(self, client: TestClient):
129
- client.post("/reset", json={})
130
- client.post("/step", json={"command": "nmap -sV web", "mode": "red"})
131
- client.post("/step", json={"command": "tail -f /var/log/syslog", "mode": "blue"})
 
132
  data = client.get("/console/api/history").json()
133
  assert len(data) == 2
134
  # Newest first
@@ -136,8 +151,9 @@ class TestHistoryAPI:
136
  assert data[1]["mode"] == "red"
137
 
138
  def test_history_has_timestamps(self, client: TestClient):
139
- client.post("/reset", json={})
140
- client.post("/step", json={"command": "nmap web", "mode": "red"})
 
141
  data = client.get("/console/api/history").json()
142
  assert len(data) == 1
143
  assert "time" in data[0]
@@ -145,11 +161,9 @@ class TestHistoryAPI:
145
 
146
  def test_history_max_20(self, client: TestClient):
147
  """History API should return at most 20 entries."""
148
- client.post("/reset", json={})
 
149
  for i in range(25):
150
- client.post(
151
- "/step",
152
- json={"command": f"cmd_{i}", "mode": "red"},
153
- )
154
  data = client.get("/console/api/history").json()
155
  assert len(data) == 20
 
1
  """Tests for the operator debugging console (issue #28).
2
 
3
+ Uses Starlette's TestClient against the OpenEnv app with console router.
4
  No Docker dependency.
5
+
6
+ Note: OpenEnv HTTP endpoints are stateless (each creates a new env instance).
7
+ Console API uses a fallback env stored on app.state. History is recorded
8
+ via the module-level record_action() / clear_history() helpers.
9
  """
10
 
11
  from __future__ import annotations
 
14
  from starlette.testclient import TestClient
15
 
16
  from open_range.server.app import create_app
17
+ from open_range.server.console import clear_history, record_action
18
+ from open_range.server.environment import RangeEnvironment
19
 
20
 
21
  @pytest.fixture()
22
+ def client():
23
+ """Create a TestClient with a shared env on app.state for console API."""
 
 
24
  app = create_app()
25
+ # Store a shared env so console API endpoints can access state
26
+ env = RangeEnvironment(docker_available=False)
27
+ app.state.env = env
28
+ clear_history()
29
  return TestClient(app)
30
 
31
 
32
+ @pytest.fixture()
33
+ def env(client: TestClient) -> RangeEnvironment:
34
+ """Return the shared env stored on app.state."""
35
+ return client.app.state.env
36
+
37
+
38
  # ===================================================================
39
  # GET /console -- HTML page
40
  # ===================================================================
 
73
  data = client.get("/console/api/snapshot").json()
74
  assert data["id"] is None
75
 
76
+ def test_snapshot_after_reset(self, client: TestClient, env: RangeEnvironment):
77
+ env.reset(episode_id="snap_test_1")
78
  data = client.get("/console/api/snapshot").json()
79
  assert data["id"] == "snap_test_1"
80
  assert "hosts" in data
 
82
  assert "vuln_count" in data
83
  assert "tier" in data
84
 
85
+ def test_snapshot_no_truth_graph_or_flags(self, client: TestClient, env: RangeEnvironment):
86
  """Snapshot API must NOT leak truth_graph or flag values."""
87
+ env.reset()
88
  data = client.get("/console/api/snapshot").json()
89
  assert "truth_graph" not in data
90
  assert "flags" not in data
 
103
  data = resp.json()
104
  assert isinstance(data, dict)
105
 
106
+ def test_episode_fields(self, client: TestClient, env: RangeEnvironment):
107
+ env.reset()
108
  data = client.get("/console/api/episode").json()
109
  assert "step_count" in data
110
  assert "flags_found" in data
111
  assert "mode" in data
112
  assert "services_status" in data
113
 
114
+ def test_episode_step_count_updates(self, client: TestClient, env: RangeEnvironment):
115
+ from open_range.server.models import RangeAction
116
+
117
+ env.reset()
118
  data = client.get("/console/api/episode").json()
119
  assert data["step_count"] == 0
120
 
121
+ env.step(RangeAction(command="nmap web", mode="red"))
122
  data = client.get("/console/api/episode").json()
123
  assert data["step_count"] == 1
124
 
 
136
  assert isinstance(data, list)
137
 
138
  def test_history_empty_initially(self, client: TestClient):
 
 
139
  data = client.get("/console/api/history").json()
140
  assert data == []
141
 
142
  def test_history_records_actions(self, client: TestClient):
143
+ import time
144
+
145
+ record_action({"step": 1, "command": "nmap -sV web", "mode": "red", "time": time.time()})
146
+ record_action({"step": 2, "command": "tail -f /var/log/syslog", "mode": "blue", "time": time.time()})
147
  data = client.get("/console/api/history").json()
148
  assert len(data) == 2
149
  # Newest first
 
151
  assert data[1]["mode"] == "red"
152
 
153
  def test_history_has_timestamps(self, client: TestClient):
154
+ import time
155
+
156
+ record_action({"step": 1, "command": "nmap web", "mode": "red", "time": time.time()})
157
  data = client.get("/console/api/history").json()
158
  assert len(data) == 1
159
  assert "time" in data[0]
 
161
 
162
  def test_history_max_20(self, client: TestClient):
163
  """History API should return at most 20 entries."""
164
+ import time
165
+
166
  for i in range(25):
167
+ record_action({"step": i, "command": f"cmd_{i}", "mode": "red", "time": time.time()})
 
 
 
168
  data = client.get("/console/api/history").json()
169
  assert len(data) == 20
tests/test_parse_llm_response.py ADDED
@@ -0,0 +1,1075 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Tests for _parse_llm_response() — the critical LLM JSON -> SnapshotSpec mapper.
2
+
3
+ Covers field name aliases, evidence spec formats, NPC persona parsing,
4
+ files dict extraction, missing/minimal/malformed input, and a real LLM
5
+ output fixture from snapshots/llm_tier1_test.json.
6
+ """
7
+
8
+ import json
9
+ from pathlib import Path
10
+
11
+ import pytest
12
+
13
+ from open_range.builder.builder import _parse_llm_response
14
+ from open_range.protocols import (
15
+ EvidenceItem,
16
+ ExploitStep,
17
+ FlagSpec,
18
+ GoldenPathStep,
19
+ NPCPersona,
20
+ SnapshotSpec,
21
+ Vulnerability,
22
+ )
23
+
24
+ ROOT = Path(__file__).parent.parent
25
+
26
+
27
+ # ---------------------------------------------------------------------------
28
+ # Helpers
29
+ # ---------------------------------------------------------------------------
30
+
31
+
32
+ def _minimal_json(**overrides) -> str:
33
+ """Return a minimal valid JSON string for _parse_llm_response.
34
+
35
+ All top-level keys present but with empty/default values unless overridden.
36
+ """
37
+ base: dict = {
38
+ "topology": {},
39
+ "truth_graph": {"vulns": [], "exploit_chain": []},
40
+ "golden_path": [],
41
+ "flags": [],
42
+ "evidence_spec": {},
43
+ "npc_personas": [],
44
+ "npc_traffic": {},
45
+ "task": {},
46
+ }
47
+ base.update(overrides)
48
+ return json.dumps(base)
49
+
50
+
51
+ # ---------------------------------------------------------------------------
52
+ # 1. Happy path with real LLM output
53
+ # ---------------------------------------------------------------------------
54
+
55
+
56
+ class TestRealLLMOutput:
57
+ """Parse the actual LLM-generated JSON from snapshots/llm_tier1_test.json."""
58
+
59
+ @pytest.fixture
60
+ def llm_json(self):
61
+ path = ROOT / "snapshots" / "llm_tier1_test.json"
62
+ return path.read_text()
63
+
64
+ def test_parses_to_snapshot_spec(self, llm_json):
65
+ spec = _parse_llm_response(llm_json)
66
+ assert isinstance(spec, SnapshotSpec)
67
+
68
+ def test_topology_hosts(self, llm_json):
69
+ spec = _parse_llm_response(llm_json)
70
+ hosts = spec.topology["hosts"]
71
+ assert "attacker" in hosts
72
+ assert "web" in hosts
73
+ assert "siem" in hosts
74
+ assert len(hosts) == 8
75
+
76
+ def test_topology_zones(self, llm_json):
77
+ spec = _parse_llm_response(llm_json)
78
+ zones = spec.topology["zones"]
79
+ assert "dmz" in zones
80
+ assert "web" in zones["dmz"]
81
+
82
+ def test_truth_graph_vulns(self, llm_json):
83
+ spec = _parse_llm_response(llm_json)
84
+ assert len(spec.truth_graph.vulns) == 2
85
+ vuln_ids = {v.id for v in spec.truth_graph.vulns}
86
+ assert "VULN-SQLI-LOOKUP" in vuln_ids
87
+ assert "VULN-MISSING-AUTHZ-REPORT" in vuln_ids
88
+
89
+ def test_truth_graph_exploit_chain(self, llm_json):
90
+ spec = _parse_llm_response(llm_json)
91
+ assert len(spec.truth_graph.exploit_chain) == 2
92
+ assert spec.truth_graph.exploit_chain[0].vuln_id == "VULN-SQLI-LOOKUP"
93
+
94
+ def test_flags(self, llm_json):
95
+ spec = _parse_llm_response(llm_json)
96
+ assert len(spec.flags) == 2
97
+ values = {f.value for f in spec.flags}
98
+ assert "FLAG{9f3a2b4c5d6e7f80}" in values
99
+ assert "FLAG{1a2b3c4d5e6f7788}" in values
100
+
101
+ def test_golden_path(self, llm_json):
102
+ spec = _parse_llm_response(llm_json)
103
+ assert len(spec.golden_path) == 8
104
+ # The real LLM output uses "cmd" field name
105
+ assert spec.golden_path[0].command == "nmap -p 80 10.0.1.10"
106
+ assert spec.golden_path[0].expect_in_stdout == "80/tcp open"
107
+
108
+ def test_task_briefings(self, llm_json):
109
+ spec = _parse_llm_response(llm_json)
110
+ assert "Meridian" in spec.task.red_briefing
111
+ assert spec.task.blue_briefing != ""
112
+
113
+ def test_npc_personas(self, llm_json):
114
+ spec = _parse_llm_response(llm_json)
115
+ assert len(spec.npc_personas) == 8
116
+ names = {p.name for p in spec.npc_personas}
117
+ assert "Derek Thompson" in names
118
+ assert "Karen Williams" in names
119
+
120
+ def test_npc_persona_security_awareness(self, llm_json):
121
+ spec = _parse_llm_response(llm_json)
122
+ by_name = {p.name: p for p in spec.npc_personas}
123
+ assert by_name["Derek Thompson"].security_awareness == 0.85
124
+ assert by_name["Karen Williams"].security_awareness == 0.25
125
+
126
+ def test_files_dict(self, llm_json):
127
+ spec = _parse_llm_response(llm_json)
128
+ # Real LLM output has explicit files + vulnerable_code dicts
129
+ assert len(spec.files) > 0
130
+ assert "web:/var/www/portal/lookup.php" in spec.files
131
+ assert "web:/var/www/portal/admin/compliance_report.php" in spec.files
132
+
133
+ def test_vulnerable_code_as_dict_extracted_to_files(self, llm_json):
134
+ spec = _parse_llm_response(llm_json)
135
+ # The VULN-SQLI-LOOKUP has vulnerable_code as dict with key
136
+ # /var/www/portal/lookup.php. It should be extracted to files
137
+ # as "web:/var/www/portal/lookup.php".
138
+ # But the explicit files dict already has this key, so the
139
+ # explicit one takes precedence (container_key not in files check).
140
+ assert "web:/var/www/portal/lookup.php" in spec.files
141
+
142
+
143
+ # ---------------------------------------------------------------------------
144
+ # 2. Field name mappings (ExploitStep aliases)
145
+ # ---------------------------------------------------------------------------
146
+
147
+
148
+ class TestExploitStepFieldMappings:
149
+ """LLM uses vuln/action/yields; Pydantic expects vuln_id/command/description."""
150
+
151
+ def test_vuln_maps_to_vuln_id(self):
152
+ raw = _minimal_json(
153
+ truth_graph={
154
+ "vulns": [],
155
+ "exploit_chain": [
156
+ {"vuln": "V1", "action": "run exploit", "yields": "root shell"}
157
+ ],
158
+ }
159
+ )
160
+ spec = _parse_llm_response(raw)
161
+ assert spec.truth_graph.exploit_chain[0].vuln_id == "V1"
162
+
163
+ def test_action_maps_to_command(self):
164
+ raw = _minimal_json(
165
+ truth_graph={
166
+ "vulns": [],
167
+ "exploit_chain": [
168
+ {"vuln": "V1", "action": "sqlmap -u http://...", "yields": "db dump"}
169
+ ],
170
+ }
171
+ )
172
+ spec = _parse_llm_response(raw)
173
+ assert spec.truth_graph.exploit_chain[0].command == "sqlmap -u http://..."
174
+
175
+ def test_yields_maps_to_description(self):
176
+ raw = _minimal_json(
177
+ truth_graph={
178
+ "vulns": [],
179
+ "exploit_chain": [
180
+ {"vuln": "V1", "action": "cmd", "yields": "got credentials"}
181
+ ],
182
+ }
183
+ )
184
+ spec = _parse_llm_response(raw)
185
+ assert spec.truth_graph.exploit_chain[0].description == "got credentials"
186
+
187
+ def test_canonical_names_also_work(self):
188
+ """vuln_id/command/description should pass through without aliasing."""
189
+ raw = _minimal_json(
190
+ truth_graph={
191
+ "vulns": [],
192
+ "exploit_chain": [
193
+ {
194
+ "vuln_id": "V2",
195
+ "command": "nmap -sV ...",
196
+ "description": "port scan",
197
+ }
198
+ ],
199
+ }
200
+ )
201
+ spec = _parse_llm_response(raw)
202
+ ec = spec.truth_graph.exploit_chain[0]
203
+ assert ec.vuln_id == "V2"
204
+ assert ec.command == "nmap -sV ..."
205
+ assert ec.description == "port scan"
206
+
207
+ def test_canonical_names_take_precedence(self):
208
+ """When both canonical and alias are present, canonical wins (via get order)."""
209
+ raw = _minimal_json(
210
+ truth_graph={
211
+ "vulns": [],
212
+ "exploit_chain": [
213
+ {
214
+ "vuln_id": "canonical",
215
+ "vuln": "alias",
216
+ "command": "canonical_cmd",
217
+ "action": "alias_cmd",
218
+ "description": "canonical_desc",
219
+ "yields": "alias_desc",
220
+ }
221
+ ],
222
+ }
223
+ )
224
+ spec = _parse_llm_response(raw)
225
+ ec = spec.truth_graph.exploit_chain[0]
226
+ assert ec.vuln_id == "canonical"
227
+ assert ec.command == "canonical_cmd"
228
+ assert ec.description == "canonical_desc"
229
+
230
+
231
+ # ---------------------------------------------------------------------------
232
+ # 3. GoldenPathStep field mappings
233
+ # ---------------------------------------------------------------------------
234
+
235
+
236
+ class TestGoldenPathFieldMappings:
237
+ """LLM uses cmd/expect_stdout; Pydantic expects command/expect_in_stdout."""
238
+
239
+ def test_cmd_maps_to_command(self):
240
+ raw = _minimal_json(
241
+ golden_path=[
242
+ {"step": 1, "cmd": "nmap -sV 10.0.1.0/24", "expect_stdout": "open"}
243
+ ]
244
+ )
245
+ spec = _parse_llm_response(raw)
246
+ assert spec.golden_path[0].command == "nmap -sV 10.0.1.0/24"
247
+
248
+ def test_expect_stdout_maps_to_expect_in_stdout(self):
249
+ raw = _minimal_json(
250
+ golden_path=[
251
+ {"step": 1, "cmd": "whoami", "expect_stdout": "root"}
252
+ ]
253
+ )
254
+ spec = _parse_llm_response(raw)
255
+ assert spec.golden_path[0].expect_in_stdout == "root"
256
+
257
+ def test_canonical_command_field(self):
258
+ raw = _minimal_json(
259
+ golden_path=[
260
+ {"step": 1, "command": "ls -la", "expect_in_stdout": "total"}
261
+ ]
262
+ )
263
+ spec = _parse_llm_response(raw)
264
+ assert spec.golden_path[0].command == "ls -la"
265
+ assert spec.golden_path[0].expect_in_stdout == "total"
266
+
267
+ def test_mixed_field_names_across_steps(self):
268
+ """Some steps use cmd, others use command — both should parse."""
269
+ raw = _minimal_json(
270
+ golden_path=[
271
+ {"step": 1, "cmd": "nmap scan", "expect_stdout": "80/tcp"},
272
+ {"step": 2, "command": "curl http://web", "expect_in_stdout": "Welcome"},
273
+ {"step": 3, "cmd": "sqlmap", "expect_in_stdout": "FLAG"},
274
+ ]
275
+ )
276
+ spec = _parse_llm_response(raw)
277
+ assert len(spec.golden_path) == 3
278
+ assert spec.golden_path[0].command == "nmap scan"
279
+ assert spec.golden_path[0].expect_in_stdout == "80/tcp"
280
+ assert spec.golden_path[1].command == "curl http://web"
281
+ assert spec.golden_path[1].expect_in_stdout == "Welcome"
282
+ assert spec.golden_path[2].command == "sqlmap"
283
+ assert spec.golden_path[2].expect_in_stdout == "FLAG"
284
+
285
+ def test_step_number_preserved(self):
286
+ raw = _minimal_json(
287
+ golden_path=[
288
+ {"step": 5, "cmd": "echo hi", "expect_stdout": "hi"}
289
+ ]
290
+ )
291
+ spec = _parse_llm_response(raw)
292
+ assert spec.golden_path[0].step == 5
293
+
294
+ def test_description_field_preserved(self):
295
+ raw = _minimal_json(
296
+ golden_path=[
297
+ {
298
+ "step": 1,
299
+ "cmd": "nmap",
300
+ "expect_stdout": "open",
301
+ "description": "Port scan the DMZ",
302
+ }
303
+ ]
304
+ )
305
+ spec = _parse_llm_response(raw)
306
+ assert spec.golden_path[0].description == "Port scan the DMZ"
307
+
308
+ def test_cmd_takes_precedence_over_command(self):
309
+ """When both cmd and command are present, cmd wins (it's checked first)."""
310
+ raw = _minimal_json(
311
+ golden_path=[
312
+ {
313
+ "step": 1,
314
+ "cmd": "cmd_value",
315
+ "command": "command_value",
316
+ "expect_stdout": "x",
317
+ }
318
+ ]
319
+ )
320
+ spec = _parse_llm_response(raw)
321
+ assert spec.golden_path[0].command == "cmd_value"
322
+
323
+
324
+ # ---------------------------------------------------------------------------
325
+ # 4. Evidence spec parsing
326
+ # ---------------------------------------------------------------------------
327
+
328
+
329
+ class TestEvidenceSpecParsing:
330
+ """LLM returns dict, protocol expects list[EvidenceItem]."""
331
+
332
+ def test_dict_with_string_values(self):
333
+ raw = _minimal_json(
334
+ evidence_spec={
335
+ "web_access_log": "SQL injection pattern",
336
+ "siem_alerts": "Unauthorized access",
337
+ }
338
+ )
339
+ spec = _parse_llm_response(raw)
340
+ assert len(spec.evidence_spec) == 2
341
+ locations = {e.location for e in spec.evidence_spec}
342
+ assert "web_access_log" in locations
343
+ assert "siem_alerts" in locations
344
+ # String values become log_entry type
345
+ for e in spec.evidence_spec:
346
+ if e.location == "web_access_log":
347
+ assert e.type == "log_entry"
348
+ assert e.pattern == "SQL injection pattern"
349
+
350
+ def test_dict_with_list_values(self):
351
+ raw = _minimal_json(
352
+ evidence_spec={
353
+ "siem_alerts": ["UNION SELECT detected", "admin endpoint accessed"],
354
+ }
355
+ )
356
+ spec = _parse_llm_response(raw)
357
+ assert len(spec.evidence_spec) == 2
358
+ # List values become alert type
359
+ for e in spec.evidence_spec:
360
+ assert e.type == "alert"
361
+ assert e.location == "siem_alerts"
362
+ patterns = {e.pattern for e in spec.evidence_spec}
363
+ assert "UNION SELECT detected" in patterns
364
+ assert "admin endpoint accessed" in patterns
365
+
366
+ def test_dict_with_mixed_values(self):
367
+ raw = _minimal_json(
368
+ evidence_spec={
369
+ "web_log": "GET /search?q=",
370
+ "alerts": ["sqli_detected", "auth_bypass"],
371
+ }
372
+ )
373
+ spec = _parse_llm_response(raw)
374
+ assert len(spec.evidence_spec) == 3 # 1 string + 2 list items
375
+
376
+ def test_list_format_passthrough(self):
377
+ """When evidence_spec is already a list of dicts, parse directly."""
378
+ raw = _minimal_json(
379
+ evidence_spec=[
380
+ {"type": "alert", "location": "siem", "pattern": "SQLi"},
381
+ {"type": "log_entry", "location": "web_log", "pattern": "GET /admin"},
382
+ ]
383
+ )
384
+ spec = _parse_llm_response(raw)
385
+ assert len(spec.evidence_spec) == 2
386
+ assert spec.evidence_spec[0].type == "alert"
387
+ assert spec.evidence_spec[1].location == "web_log"
388
+
389
+ def test_empty_dict(self):
390
+ raw = _minimal_json(evidence_spec={})
391
+ spec = _parse_llm_response(raw)
392
+ assert spec.evidence_spec == []
393
+
394
+ def test_empty_list(self):
395
+ raw = _minimal_json(evidence_spec=[])
396
+ spec = _parse_llm_response(raw)
397
+ assert spec.evidence_spec == []
398
+
399
+
400
+ # ---------------------------------------------------------------------------
401
+ # 5. NPC persona parsing
402
+ # ---------------------------------------------------------------------------
403
+
404
+
405
+ class TestNPCPersonaParsing:
406
+ def test_basic_persona(self):
407
+ raw = _minimal_json(
408
+ npc_personas=[
409
+ {
410
+ "name": "Alice",
411
+ "role": "Admin",
412
+ "department": "IT",
413
+ "security_awareness": 0.9,
414
+ }
415
+ ]
416
+ )
417
+ spec = _parse_llm_response(raw)
418
+ assert len(spec.npc_personas) == 1
419
+ p = spec.npc_personas[0]
420
+ assert p.name == "Alice"
421
+ assert p.role == "Admin"
422
+ assert p.department == "IT"
423
+ assert p.security_awareness == 0.9
424
+
425
+ def test_accounts_with_string_values(self):
426
+ raw = _minimal_json(
427
+ npc_personas=[
428
+ {
429
+ "name": "Bob",
430
+ "accounts": {
431
+ "email": "bob@corp.local",
432
+ "ldap_dn": "cn=bob,dc=corp,dc=local",
433
+ },
434
+ }
435
+ ]
436
+ )
437
+ spec = _parse_llm_response(raw)
438
+ assert spec.npc_personas[0].accounts["email"] == "bob@corp.local"
439
+
440
+ def test_default_security_awareness(self):
441
+ """Missing security_awareness defaults to 0.5."""
442
+ raw = _minimal_json(npc_personas=[{"name": "Charlie"}])
443
+ spec = _parse_llm_response(raw)
444
+ assert spec.npc_personas[0].security_awareness == 0.5
445
+
446
+ def test_susceptibility_dict(self):
447
+ raw = _minimal_json(
448
+ npc_personas=[
449
+ {
450
+ "name": "Diana",
451
+ "susceptibility": {"phishing": 0.8, "pretexting": 0.6},
452
+ }
453
+ ]
454
+ )
455
+ spec = _parse_llm_response(raw)
456
+ assert spec.npc_personas[0].susceptibility["phishing"] == 0.8
457
+
458
+ def test_routine_dict(self):
459
+ raw = _minimal_json(
460
+ npc_personas=[
461
+ {
462
+ "name": "Eve",
463
+ "routine": {
464
+ "morning": "check email",
465
+ "afternoon": "process reports",
466
+ },
467
+ }
468
+ ]
469
+ )
470
+ spec = _parse_llm_response(raw)
471
+ assert spec.npc_personas[0].routine["morning"] == "check email"
472
+
473
+ def test_multiple_personas(self):
474
+ raw = _minimal_json(
475
+ npc_personas=[
476
+ {"name": "P1", "security_awareness": 0.1},
477
+ {"name": "P2", "security_awareness": 0.5},
478
+ {"name": "P3", "security_awareness": 0.9},
479
+ ]
480
+ )
481
+ spec = _parse_llm_response(raw)
482
+ assert len(spec.npc_personas) == 3
483
+ names = [p.name for p in spec.npc_personas]
484
+ assert names == ["P1", "P2", "P3"]
485
+
486
+ def test_missing_optional_fields_default(self):
487
+ """All optional fields should default gracefully."""
488
+ raw = _minimal_json(npc_personas=[{"name": "Minimal"}])
489
+ spec = _parse_llm_response(raw)
490
+ p = spec.npc_personas[0]
491
+ assert p.name == "Minimal"
492
+ assert p.role == ""
493
+ assert p.department == ""
494
+ assert p.reports_to == ""
495
+ assert p.communication_style == ""
496
+ assert p.susceptibility == {}
497
+ assert p.routine == {}
498
+ assert p.accounts == {}
499
+
500
+
501
+ # ---------------------------------------------------------------------------
502
+ # 6. Files dict extraction
503
+ # ---------------------------------------------------------------------------
504
+
505
+
506
+ class TestFilesDictExtraction:
507
+ def test_explicit_files_field(self):
508
+ raw = _minimal_json(
509
+ files={
510
+ "web:/var/www/index.php": "<?php echo 'hello'; ?>",
511
+ "db:/opt/init.sql": "CREATE TABLE t(id INT);",
512
+ }
513
+ )
514
+ spec = _parse_llm_response(raw)
515
+ assert len(spec.files) == 2
516
+ assert spec.files["web:/var/www/index.php"] == "<?php echo 'hello'; ?>"
517
+
518
+ def test_vulnerable_code_dict_extracted(self):
519
+ """vulnerable_code as {file_path: code} should be extracted to files."""
520
+ raw = _minimal_json(
521
+ truth_graph={
522
+ "vulns": [
523
+ {
524
+ "id": "v1",
525
+ "type": "sqli",
526
+ "host": "web",
527
+ "service": "php",
528
+ "injection_point": "/search",
529
+ "vulnerable_code": {
530
+ "/var/www/search.php": "<?php $q=$_GET['q']; ?>"
531
+ },
532
+ }
533
+ ],
534
+ "exploit_chain": [],
535
+ }
536
+ )
537
+ spec = _parse_llm_response(raw)
538
+ assert "web:/var/www/search.php" in spec.files
539
+ assert spec.files["web:/var/www/search.php"] == "<?php $q=$_GET['q']; ?>"
540
+
541
+ def test_vulnerable_code_string_on_web_host(self):
542
+ """String vulnerable_code on web host with / injection_point goes to web:/var/www/portal{ip}."""
543
+ raw = _minimal_json(
544
+ truth_graph={
545
+ "vulns": [
546
+ {
547
+ "id": "v1",
548
+ "type": "sqli",
549
+ "host": "web",
550
+ "service": "php",
551
+ "injection_point": "/search.php",
552
+ "vulnerable_code": "<?php echo 'vuln'; ?>",
553
+ }
554
+ ],
555
+ "exploit_chain": [],
556
+ }
557
+ )
558
+ spec = _parse_llm_response(raw)
559
+ assert "web:/var/www/portal/search.php" in spec.files
560
+
561
+ def test_vulnerable_code_string_non_web_host_skipped(self):
562
+ """String vulnerable_code on non-web host without / prefix is not extracted."""
563
+ raw = _minimal_json(
564
+ truth_graph={
565
+ "vulns": [
566
+ {
567
+ "id": "v1",
568
+ "type": "weak_creds",
569
+ "host": "db",
570
+ "service": "mysql",
571
+ "injection_point": "mysql -u root -proot",
572
+ "vulnerable_code": "",
573
+ }
574
+ ],
575
+ "exploit_chain": [],
576
+ }
577
+ )
578
+ spec = _parse_llm_response(raw)
579
+ assert len(spec.files) == 0
580
+
581
+ def test_explicit_files_not_overwritten_by_vulnerable_code(self):
582
+ """If explicit files has a key, vulnerable_code should not overwrite it."""
583
+ raw = _minimal_json(
584
+ files={"web:/var/www/search.php": "explicit content"},
585
+ truth_graph={
586
+ "vulns": [
587
+ {
588
+ "id": "v1",
589
+ "type": "sqli",
590
+ "host": "web",
591
+ "service": "php",
592
+ "injection_point": "/search",
593
+ "vulnerable_code": {
594
+ "/var/www/search.php": "vulnerable content"
595
+ },
596
+ }
597
+ ],
598
+ "exploit_chain": [],
599
+ },
600
+ )
601
+ spec = _parse_llm_response(raw)
602
+ assert spec.files["web:/var/www/search.php"] == "explicit content"
603
+
604
+ def test_no_files_field_produces_empty_dict(self):
605
+ raw = _minimal_json()
606
+ spec = _parse_llm_response(raw)
607
+ assert spec.files == {}
608
+
609
+ def test_files_field_non_string_values_skipped(self):
610
+ """Non-string values in files dict are silently skipped."""
611
+ raw = _minimal_json(
612
+ files={
613
+ "web:/good.php": "<?php ?>",
614
+ "web:/bad.php": 12345,
615
+ "web:/also_bad.php": ["not", "a", "string"],
616
+ }
617
+ )
618
+ spec = _parse_llm_response(raw)
619
+ assert len(spec.files) == 1
620
+ assert "web:/good.php" in spec.files
621
+
622
+
623
+ # ---------------------------------------------------------------------------
624
+ # 7. Missing optional fields
625
+ # ---------------------------------------------------------------------------
626
+
627
+
628
+ class TestMissingOptionalFields:
629
+ def test_missing_evidence_spec(self):
630
+ data = {
631
+ "topology": {},
632
+ "truth_graph": {"vulns": [], "exploit_chain": []},
633
+ "golden_path": [],
634
+ "flags": [],
635
+ "npc_personas": [],
636
+ "npc_traffic": {},
637
+ "task": {},
638
+ }
639
+ spec = _parse_llm_response(json.dumps(data))
640
+ assert spec.evidence_spec == []
641
+
642
+ def test_missing_npc_personas(self):
643
+ data = {
644
+ "topology": {},
645
+ "truth_graph": {"vulns": [], "exploit_chain": []},
646
+ "golden_path": [],
647
+ "flags": [],
648
+ "evidence_spec": {},
649
+ "npc_traffic": {},
650
+ "task": {},
651
+ }
652
+ spec = _parse_llm_response(json.dumps(data))
653
+ assert spec.npc_personas == []
654
+
655
+ def test_missing_npc_traffic(self):
656
+ data = {
657
+ "topology": {},
658
+ "truth_graph": {"vulns": [], "exploit_chain": []},
659
+ "golden_path": [],
660
+ "flags": [],
661
+ "evidence_spec": {},
662
+ "npc_personas": [],
663
+ "task": {},
664
+ }
665
+ spec = _parse_llm_response(json.dumps(data))
666
+ # npc_traffic gets default NPCTrafficSpec values
667
+ assert spec.npc_traffic.level == 0
668
+
669
+ def test_missing_task(self):
670
+ data = {
671
+ "topology": {},
672
+ "truth_graph": {"vulns": [], "exploit_chain": []},
673
+ "golden_path": [],
674
+ "flags": [],
675
+ "evidence_spec": {},
676
+ "npc_personas": [],
677
+ "npc_traffic": {},
678
+ }
679
+ spec = _parse_llm_response(json.dumps(data))
680
+ assert spec.task.red_briefing == ""
681
+ assert spec.task.blue_briefing == ""
682
+
683
+ def test_missing_truth_graph(self):
684
+ data = {
685
+ "topology": {"hosts": ["web"]},
686
+ "golden_path": [],
687
+ "flags": [],
688
+ "evidence_spec": {},
689
+ "npc_personas": [],
690
+ "npc_traffic": {},
691
+ "task": {},
692
+ }
693
+ spec = _parse_llm_response(json.dumps(data))
694
+ assert spec.truth_graph.vulns == []
695
+ assert spec.truth_graph.exploit_chain == []
696
+
697
+ def test_missing_golden_path(self):
698
+ data = {
699
+ "topology": {},
700
+ "truth_graph": {"vulns": [], "exploit_chain": []},
701
+ "flags": [],
702
+ "evidence_spec": {},
703
+ "npc_personas": [],
704
+ "npc_traffic": {},
705
+ "task": {},
706
+ }
707
+ spec = _parse_llm_response(json.dumps(data))
708
+ assert spec.golden_path == []
709
+
710
+ def test_missing_flags(self):
711
+ data = {
712
+ "topology": {},
713
+ "truth_graph": {"vulns": [], "exploit_chain": []},
714
+ "golden_path": [],
715
+ "evidence_spec": {},
716
+ "npc_personas": [],
717
+ "npc_traffic": {},
718
+ "task": {},
719
+ }
720
+ spec = _parse_llm_response(json.dumps(data))
721
+ assert spec.flags == []
722
+
723
+ def test_vuln_with_minimal_fields(self):
724
+ """A vulnerability with only id, type, host should parse fine."""
725
+ raw = _minimal_json(
726
+ truth_graph={
727
+ "vulns": [{"id": "v1", "type": "sqli", "host": "web"}],
728
+ "exploit_chain": [],
729
+ }
730
+ )
731
+ spec = _parse_llm_response(raw)
732
+ v = spec.truth_graph.vulns[0]
733
+ assert v.id == "v1"
734
+ assert v.service == ""
735
+ assert v.injection_point == ""
736
+ assert v.vulnerable_code == ""
737
+ assert v.root_cause == ""
738
+
739
+
740
+ # ---------------------------------------------------------------------------
741
+ # 8. Empty/minimal input
742
+ # ---------------------------------------------------------------------------
743
+
744
+
745
+ class TestMinimalInput:
746
+ def test_completely_empty_json_object(self):
747
+ """An empty JSON object should produce a valid SnapshotSpec with defaults."""
748
+ spec = _parse_llm_response("{}")
749
+ assert isinstance(spec, SnapshotSpec)
750
+ assert spec.topology == {}
751
+ assert spec.truth_graph.vulns == []
752
+ assert spec.golden_path == []
753
+ assert spec.flags == []
754
+ assert spec.evidence_spec == []
755
+ assert spec.npc_personas == []
756
+
757
+ def test_minimal_valid_json(self):
758
+ raw = _minimal_json()
759
+ spec = _parse_llm_response(raw)
760
+ assert isinstance(spec, SnapshotSpec)
761
+
762
+ def test_topology_only(self):
763
+ raw = json.dumps({"topology": {"hosts": ["web", "db"]}})
764
+ spec = _parse_llm_response(raw)
765
+ assert spec.topology["hosts"] == ["web", "db"]
766
+ assert spec.golden_path == []
767
+
768
+
769
+ # ---------------------------------------------------------------------------
770
+ # 9. Malformed input
771
+ # ---------------------------------------------------------------------------
772
+
773
+
774
+ class TestMalformedInput:
775
+ def test_invalid_json_raises(self):
776
+ with pytest.raises(json.JSONDecodeError):
777
+ _parse_llm_response("not valid json {{{")
778
+
779
+ def test_json_array_not_object_raises(self):
780
+ """Top-level must be an object, not an array."""
781
+ with pytest.raises((TypeError, AttributeError)):
782
+ _parse_llm_response("[1, 2, 3]")
783
+
784
+ def test_json_string_not_object_raises(self):
785
+ with pytest.raises((TypeError, AttributeError)):
786
+ _parse_llm_response('"just a string"')
787
+
788
+ def test_truth_graph_not_dict_handled(self):
789
+ """If truth_graph is a non-dict, .get() calls should fail gracefully."""
790
+ # truth_graph as string
791
+ raw = json.dumps({"truth_graph": "not a dict"})
792
+ # This will try .get() on a string, which fails
793
+ with pytest.raises(AttributeError):
794
+ _parse_llm_response(raw)
795
+
796
+ def test_golden_path_not_list_handled(self):
797
+ """If golden_path is a non-list iterable (e.g. string), .get() on items fails."""
798
+ raw = json.dumps({"golden_path": "not a list"})
799
+ with pytest.raises(AttributeError):
800
+ _parse_llm_response(raw)
801
+
802
+ def test_empty_string_raises(self):
803
+ with pytest.raises(json.JSONDecodeError):
804
+ _parse_llm_response("")
805
+
806
+ def test_json_with_trailing_comma_raises(self):
807
+ with pytest.raises(json.JSONDecodeError):
808
+ _parse_llm_response('{"key": "value",}')
809
+
810
+
811
+ # ---------------------------------------------------------------------------
812
+ # 10. Vulnerability parsing details
813
+ # ---------------------------------------------------------------------------
814
+
815
+
816
+ class TestVulnerabilityParsing:
817
+ def test_all_vuln_fields_parsed(self):
818
+ raw = _minimal_json(
819
+ truth_graph={
820
+ "vulns": [
821
+ {
822
+ "id": "VULN-001",
823
+ "type": "sqli",
824
+ "host": "web",
825
+ "service": "nginx+php",
826
+ "injection_point": "/search?q=",
827
+ "vulnerable_code": "<?php $q=$_GET['q']; ?>",
828
+ "root_cause": "No input sanitization",
829
+ "blast_radius": "Full DB read",
830
+ "remediation": "Use prepared statements",
831
+ }
832
+ ],
833
+ "exploit_chain": [],
834
+ }
835
+ )
836
+ spec = _parse_llm_response(raw)
837
+ v = spec.truth_graph.vulns[0]
838
+ assert v.id == "VULN-001"
839
+ assert v.type == "sqli"
840
+ assert v.host == "web"
841
+ assert v.service == "nginx+php"
842
+ assert v.injection_point == "/search?q="
843
+ assert v.vulnerable_code == "<?php $q=$_GET['q']; ?>"
844
+ assert v.root_cause == "No input sanitization"
845
+ assert v.blast_radius == "Full DB read"
846
+ assert v.remediation == "Use prepared statements"
847
+
848
+ def test_vulnerable_code_as_dict(self):
849
+ raw = _minimal_json(
850
+ truth_graph={
851
+ "vulns": [
852
+ {
853
+ "id": "V1",
854
+ "type": "sqli",
855
+ "host": "web",
856
+ "vulnerable_code": {
857
+ "/var/www/search.php": "<?php vuln code; ?>"
858
+ },
859
+ }
860
+ ],
861
+ "exploit_chain": [],
862
+ }
863
+ )
864
+ spec = _parse_llm_response(raw)
865
+ v = spec.truth_graph.vulns[0]
866
+ assert isinstance(v.vulnerable_code, dict)
867
+ assert v.vulnerable_code["/var/www/search.php"] == "<?php vuln code; ?>"
868
+
869
+ def test_multiple_vulns(self):
870
+ raw = _minimal_json(
871
+ truth_graph={
872
+ "vulns": [
873
+ {"id": "V1", "type": "sqli", "host": "web"},
874
+ {"id": "V2", "type": "xss", "host": "web"},
875
+ {"id": "V3", "type": "idor", "host": "web"},
876
+ ],
877
+ "exploit_chain": [],
878
+ }
879
+ )
880
+ spec = _parse_llm_response(raw)
881
+ assert len(spec.truth_graph.vulns) == 3
882
+ types = {v.type for v in spec.truth_graph.vulns}
883
+ assert types == {"sqli", "xss", "idor"}
884
+
885
+
886
+ # ---------------------------------------------------------------------------
887
+ # 11. Flag parsing
888
+ # ---------------------------------------------------------------------------
889
+
890
+
891
+ class TestFlagParsing:
892
+ def test_single_flag(self):
893
+ raw = _minimal_json(
894
+ flags=[
895
+ {
896
+ "id": "flag1",
897
+ "value": "FLAG{abc123}",
898
+ "path": "/var/flags/flag1.txt",
899
+ "host": "db",
900
+ }
901
+ ]
902
+ )
903
+ spec = _parse_llm_response(raw)
904
+ assert len(spec.flags) == 1
905
+ f = spec.flags[0]
906
+ assert f.id == "flag1"
907
+ assert f.value == "FLAG{abc123}"
908
+ assert f.path == "/var/flags/flag1.txt"
909
+ assert f.host == "db"
910
+
911
+ def test_multiple_flags(self):
912
+ raw = _minimal_json(
913
+ flags=[
914
+ {"id": "f1", "value": "FLAG{a}", "path": "/f1.txt", "host": "web"},
915
+ {"id": "f2", "value": "FLAG{b}", "path": "/f2.txt", "host": "db"},
916
+ ]
917
+ )
918
+ spec = _parse_llm_response(raw)
919
+ assert len(spec.flags) == 2
920
+
921
+ def test_missing_flag_fields_default_to_empty(self):
922
+ raw = _minimal_json(flags=[{}])
923
+ spec = _parse_llm_response(raw)
924
+ f = spec.flags[0]
925
+ assert f.id == ""
926
+ assert f.value == ""
927
+ assert f.path == ""
928
+ assert f.host == ""
929
+
930
+
931
+ # ---------------------------------------------------------------------------
932
+ # 12. NPC traffic parsing
933
+ # ---------------------------------------------------------------------------
934
+
935
+
936
+ class TestNPCTrafficParsing:
937
+ def test_http_rate_maps_to_rate_lambda(self):
938
+ raw = _minimal_json(npc_traffic={"http_rate": 25})
939
+ spec = _parse_llm_response(raw)
940
+ assert spec.npc_traffic.rate_lambda == 25
941
+
942
+ def test_default_scripts(self):
943
+ raw = _minimal_json(npc_traffic={})
944
+ spec = _parse_llm_response(raw)
945
+ assert "http_traffic.sh" in spec.npc_traffic.scripts
946
+
947
+ def test_level_always_zero(self):
948
+ """Current parser hardcodes level=0."""
949
+ raw = _minimal_json(npc_traffic={"http_rate": 50})
950
+ spec = _parse_llm_response(raw)
951
+ assert spec.npc_traffic.level == 0
952
+
953
+ def test_missing_http_rate_defaults_to_10(self):
954
+ raw = _minimal_json(npc_traffic={})
955
+ spec = _parse_llm_response(raw)
956
+ assert spec.npc_traffic.rate_lambda == 10
957
+
958
+
959
+ # ---------------------------------------------------------------------------
960
+ # 13. Task parsing
961
+ # ---------------------------------------------------------------------------
962
+
963
+
964
+ class TestTaskParsing:
965
+ def test_both_briefings(self):
966
+ raw = _minimal_json(
967
+ task={
968
+ "red_briefing": "Attack the network.",
969
+ "blue_briefing": "Defend the network.",
970
+ }
971
+ )
972
+ spec = _parse_llm_response(raw)
973
+ assert spec.task.red_briefing == "Attack the network."
974
+ assert spec.task.blue_briefing == "Defend the network."
975
+
976
+ def test_missing_briefings_default_empty(self):
977
+ raw = _minimal_json(task={})
978
+ spec = _parse_llm_response(raw)
979
+ assert spec.task.red_briefing == ""
980
+ assert spec.task.blue_briefing == ""
981
+
982
+ def test_extra_task_fields_ignored(self):
983
+ """Extra fields in task should be silently ignored."""
984
+ raw = _minimal_json(
985
+ task={
986
+ "red_briefing": "Go",
987
+ "blue_briefing": "Watch",
988
+ "unknown_field": "whatever",
989
+ }
990
+ )
991
+ spec = _parse_llm_response(raw)
992
+ assert spec.task.red_briefing == "Go"
993
+
994
+
995
+ # ---------------------------------------------------------------------------
996
+ # 14. Roundtrip / integration
997
+ # ---------------------------------------------------------------------------
998
+
999
+
1000
+ class TestRoundtrip:
1001
+ def test_complex_snapshot_parses_completely(self):
1002
+ """A complex snapshot with all sections populated should parse."""
1003
+ data = {
1004
+ "topology": {
1005
+ "hosts": ["attacker", "web", "db", "siem"],
1006
+ "zones": {"dmz": ["web"], "internal": ["db"], "mgmt": ["siem"]},
1007
+ "users": [{"username": "admin", "password": "pass", "groups": ["admins"], "hosts": ["web"]}],
1008
+ },
1009
+ "truth_graph": {
1010
+ "vulns": [
1011
+ {
1012
+ "id": "V1",
1013
+ "type": "sqli",
1014
+ "host": "web",
1015
+ "service": "php",
1016
+ "injection_point": "/search?q=",
1017
+ "vulnerable_code": {"search.php": "vuln code"},
1018
+ "root_cause": "no sanitization",
1019
+ "blast_radius": "db read",
1020
+ "remediation": "prepared stmts",
1021
+ }
1022
+ ],
1023
+ "exploit_chain": [
1024
+ {"vuln": "V1", "action": "sqlmap", "yields": "db dump"}
1025
+ ],
1026
+ },
1027
+ "golden_path": [
1028
+ {"step": 1, "cmd": "nmap -sV 10.0.1.0/24", "expect_stdout": "80/tcp"},
1029
+ {"step": 2, "command": "curl http://web/search?q=test", "expect_in_stdout": "results"},
1030
+ ],
1031
+ "flags": [
1032
+ {"id": "f1", "value": "FLAG{complex}", "path": "/flag.txt", "host": "db"}
1033
+ ],
1034
+ "evidence_spec": {
1035
+ "web_log": "sqli pattern",
1036
+ "alerts": ["sql_injection_detected"],
1037
+ },
1038
+ "npc_personas": [
1039
+ {
1040
+ "name": "Alice",
1041
+ "role": "SysAdmin",
1042
+ "department": "IT",
1043
+ "reports_to": "CTO",
1044
+ "communication_style": "technical",
1045
+ "security_awareness": 0.9,
1046
+ "susceptibility": {"phishing": 0.1},
1047
+ "routine": {"morning": "check logs"},
1048
+ "accounts": {"email": "alice@corp.local"},
1049
+ }
1050
+ ],
1051
+ "npc_traffic": {"http_rate": 20},
1052
+ "task": {
1053
+ "red_briefing": "Hack the network.",
1054
+ "blue_briefing": "Monitor and defend.",
1055
+ },
1056
+ "files": {"web:/var/www/index.php": "<?php echo 'hi'; ?>"},
1057
+ }
1058
+ spec = _parse_llm_response(json.dumps(data))
1059
+
1060
+ # Verify all sections
1061
+ assert spec.topology["hosts"] == ["attacker", "web", "db", "siem"]
1062
+ assert len(spec.truth_graph.vulns) == 1
1063
+ assert spec.truth_graph.exploit_chain[0].vuln_id == "V1"
1064
+ assert spec.truth_graph.exploit_chain[0].command == "sqlmap"
1065
+ assert len(spec.golden_path) == 2
1066
+ assert spec.golden_path[0].command == "nmap -sV 10.0.1.0/24"
1067
+ assert spec.golden_path[1].expect_in_stdout == "results"
1068
+ assert spec.flags[0].value == "FLAG{complex}"
1069
+ assert len(spec.evidence_spec) == 2 # 1 string + 1 list item
1070
+ assert len(spec.npc_personas) == 1
1071
+ assert spec.npc_traffic.rate_lambda == 20
1072
+ assert spec.task.red_briefing == "Hack the network."
1073
+ # files: explicit + vulnerable_code dict
1074
+ assert "web:/var/www/index.php" in spec.files
1075
+ assert "web:search.php" in spec.files # from vulnerable_code dict
tests/test_renderer_integration.py ADDED
@@ -0,0 +1,373 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Integration tests for the full renderer pipeline.
2
+
3
+ Loads real LLM output from snapshots/llm_tier1_test.json, parses it
4
+ through _parse_llm_response(), renders through SnapshotRenderer.render(),
5
+ and verifies all output files contain expected content.
6
+ """
7
+
8
+ from __future__ import annotations
9
+
10
+ import json
11
+ import tempfile
12
+ from pathlib import Path
13
+
14
+ import pytest
15
+
16
+ from open_range.builder.builder import _parse_llm_response
17
+ from open_range.builder.renderer import SnapshotRenderer
18
+
19
+ ROOT = Path(__file__).parent.parent
20
+ SNAPSHOT_PATH = ROOT / "snapshots" / "llm_tier1_test.json"
21
+
22
+
23
+ @pytest.fixture
24
+ def llm_output() -> dict:
25
+ """Load the real LLM output JSON."""
26
+ return json.loads(SNAPSHOT_PATH.read_text())
27
+
28
+
29
+ @pytest.fixture
30
+ def parsed_spec(llm_output):
31
+ """Parse real LLM output through _parse_llm_response."""
32
+ return _parse_llm_response(json.dumps(llm_output))
33
+
34
+
35
+ @pytest.fixture
36
+ def rendered_dir(parsed_spec):
37
+ """Render the parsed spec and yield the output directory."""
38
+ renderer = SnapshotRenderer()
39
+ with tempfile.TemporaryDirectory() as tmpdir:
40
+ out = Path(tmpdir) / "integration_out"
41
+ renderer.render(parsed_spec, out)
42
+ yield out
43
+
44
+
45
+ # ---------------------------------------------------------------------------
46
+ # Pipeline: parse -> render round-trip
47
+ # ---------------------------------------------------------------------------
48
+
49
+
50
+ class TestParseLLMOutput:
51
+ """Verify _parse_llm_response correctly handles real LLM output."""
52
+
53
+ def test_parse_produces_snapshot_spec(self, parsed_spec):
54
+ from open_range.protocols import SnapshotSpec
55
+ assert isinstance(parsed_spec, SnapshotSpec)
56
+
57
+ def test_parse_has_topology(self, parsed_spec):
58
+ assert "hosts" in parsed_spec.topology
59
+ assert len(parsed_spec.topology["hosts"]) == 8
60
+
61
+ def test_parse_has_vulns(self, parsed_spec):
62
+ assert len(parsed_spec.truth_graph.vulns) >= 1
63
+ vuln_types = {v.type for v in parsed_spec.truth_graph.vulns}
64
+ assert "sqli" in vuln_types
65
+
66
+ def test_parse_has_flags(self, parsed_spec):
67
+ assert len(parsed_spec.flags) >= 2
68
+
69
+ def test_parse_has_golden_path(self, parsed_spec):
70
+ assert len(parsed_spec.golden_path) >= 1
71
+ # Golden path steps should have commands
72
+ for step in parsed_spec.golden_path:
73
+ assert step.command, f"Step {step.step} has empty command"
74
+
75
+ def test_parse_has_task_briefings(self, parsed_spec):
76
+ assert parsed_spec.task.red_briefing
77
+ assert parsed_spec.task.blue_briefing
78
+
79
+ def test_parse_has_files(self, parsed_spec):
80
+ assert len(parsed_spec.files) > 0
81
+ # Should include web files and db:sql
82
+ web_files = [k for k in parsed_spec.files if k.startswith("web:")]
83
+ assert len(web_files) > 0
84
+
85
+ def test_parse_has_npc_personas(self, parsed_spec):
86
+ assert len(parsed_spec.npc_personas) >= 1
87
+
88
+ def test_golden_path_uses_command_field(self, parsed_spec):
89
+ """LLM output uses 'cmd', parser should map to 'command'."""
90
+ for step in parsed_spec.golden_path:
91
+ assert step.command # Should be populated from 'cmd' key
92
+
93
+ def test_golden_path_uses_expect_in_stdout(self, parsed_spec):
94
+ """LLM output uses 'expect_stdout', parser maps to 'expect_in_stdout'."""
95
+ for step in parsed_spec.golden_path:
96
+ assert step.expect_in_stdout
97
+
98
+
99
+ # ---------------------------------------------------------------------------
100
+ # All output files exist
101
+ # ---------------------------------------------------------------------------
102
+
103
+
104
+ class TestRenderedFilesExist:
105
+ """Verify all 6 template outputs are created."""
106
+
107
+ EXPECTED_FILES = [
108
+ "docker-compose.yml",
109
+ "Dockerfile.web",
110
+ "Dockerfile.db",
111
+ "nginx.conf",
112
+ "init.sql",
113
+ "iptables.rules",
114
+ ]
115
+
116
+ def test_all_output_files_exist(self, rendered_dir):
117
+ for fname in self.EXPECTED_FILES:
118
+ path = rendered_dir / fname
119
+ assert path.exists(), f"Missing output file: {fname}"
120
+
121
+ def test_all_output_files_non_empty(self, rendered_dir):
122
+ for fname in self.EXPECTED_FILES:
123
+ content = (rendered_dir / fname).read_text()
124
+ assert len(content) > 0, f"Empty output file: {fname}"
125
+
126
+
127
+ # ---------------------------------------------------------------------------
128
+ # nginx.conf content verification
129
+ # ---------------------------------------------------------------------------
130
+
131
+
132
+ class TestNginxConf:
133
+ """Verify rendered nginx.conf has correct content."""
134
+
135
+ def test_references_php_fpm_socket(self, rendered_dir):
136
+ nginx = (rendered_dir / "nginx.conf").read_text()
137
+ assert "php8.1-fpm.sock" in nginx
138
+
139
+ def test_has_server_block(self, rendered_dir):
140
+ nginx = (rendered_dir / "nginx.conf").read_text()
141
+ assert "server {" in nginx
142
+ assert "listen 80" in nginx
143
+
144
+ def test_has_php_location(self, rendered_dir):
145
+ nginx = (rendered_dir / "nginx.conf").read_text()
146
+ assert "location ~ \\.php$" in nginx
147
+
148
+ def test_has_fastcgi_pass(self, rendered_dir):
149
+ nginx = (rendered_dir / "nginx.conf").read_text()
150
+ assert "fastcgi_pass unix:/run/php/php8.1-fpm.sock" in nginx
151
+
152
+
153
+ # ---------------------------------------------------------------------------
154
+ # docker-compose.yml content verification
155
+ # ---------------------------------------------------------------------------
156
+
157
+
158
+ class TestDockerCompose:
159
+ """Verify rendered docker-compose.yml has correct static IPs and structure."""
160
+
161
+ def test_has_services_section(self, rendered_dir):
162
+ compose = (rendered_dir / "docker-compose.yml").read_text()
163
+ assert "services:" in compose
164
+
165
+ def test_has_all_core_services(self, rendered_dir):
166
+ compose = (rendered_dir / "docker-compose.yml").read_text()
167
+ for service in ["attacker:", "firewall:", "web:", "mail:", "db:", "siem:", "ldap:", "files:"]:
168
+ assert service in compose, f"Missing service: {service}"
169
+
170
+ def test_has_network_definitions(self, rendered_dir):
171
+ compose = (rendered_dir / "docker-compose.yml").read_text()
172
+ assert "networks:" in compose
173
+ assert "external:" in compose
174
+ assert "dmz:" in compose
175
+ assert "internal:" in compose
176
+ assert "management:" in compose
177
+
178
+ def test_has_static_ips(self, rendered_dir):
179
+ compose = (rendered_dir / "docker-compose.yml").read_text()
180
+ # Key static IPs from the template
181
+ assert "10.0.0.10" in compose # attacker
182
+ assert "10.0.0.2" in compose # firewall external
183
+ assert "10.0.1.10" in compose # web dmz
184
+ assert "10.0.2.20" in compose # db internal
185
+ assert "10.0.3.20" in compose # ldap management
186
+ assert "10.0.3.21" in compose # siem management
187
+
188
+ def test_web_depends_on_db(self, rendered_dir):
189
+ compose = (rendered_dir / "docker-compose.yml").read_text()
190
+ # web service should have depends_on db
191
+ assert "depends_on:" in compose
192
+
193
+ def test_has_subnet_definitions(self, rendered_dir):
194
+ compose = (rendered_dir / "docker-compose.yml").read_text()
195
+ assert "10.0.0.0/24" in compose # external
196
+ assert "10.0.1.0/24" in compose # dmz
197
+ assert "10.0.2.0/24" in compose # internal
198
+ assert "10.0.3.0/24" in compose # management
199
+
200
+ def test_has_healthchecks(self, rendered_dir):
201
+ compose = (rendered_dir / "docker-compose.yml").read_text()
202
+ assert "healthcheck:" in compose
203
+
204
+ def test_attacker_has_net_admin(self, rendered_dir):
205
+ compose = (rendered_dir / "docker-compose.yml").read_text()
206
+ assert "NET_ADMIN" in compose
207
+
208
+ def test_db_has_mysql_env_vars(self, rendered_dir):
209
+ compose = (rendered_dir / "docker-compose.yml").read_text()
210
+ assert "MYSQL_ROOT_PASSWORD" in compose
211
+ assert "MYSQL_DATABASE=referral_db" in compose
212
+ assert "MYSQL_USER=app_user" in compose
213
+
214
+
215
+ # ---------------------------------------------------------------------------
216
+ # init.sql content verification
217
+ # ---------------------------------------------------------------------------
218
+
219
+
220
+ class TestInitSQL:
221
+ """Verify rendered init.sql has referral_db and app_user."""
222
+
223
+ def test_creates_referral_db(self, rendered_dir):
224
+ sql = (rendered_dir / "init.sql").read_text()
225
+ assert "referral_db" in sql
226
+
227
+ def test_creates_flags_db(self, rendered_dir):
228
+ sql = (rendered_dir / "init.sql").read_text()
229
+ assert "flags" in sql
230
+
231
+ def test_creates_core_tables(self, rendered_dir):
232
+ sql = (rendered_dir / "init.sql").read_text()
233
+ assert "CREATE TABLE" in sql
234
+ assert "users" in sql
235
+ assert "patients" in sql
236
+ assert "secrets" in sql
237
+
238
+ def test_creates_healthcare_tables(self, rendered_dir):
239
+ sql = (rendered_dir / "init.sql").read_text()
240
+ assert "patient_referrals" in sql
241
+ assert "billing" in sql
242
+
243
+ def test_grants_app_user(self, rendered_dir):
244
+ sql = (rendered_dir / "init.sql").read_text()
245
+ assert "app_user" in sql
246
+ assert "GRANT" in sql
247
+
248
+ def test_has_flush_privileges(self, rendered_dir):
249
+ sql = (rendered_dir / "init.sql").read_text()
250
+ assert "FLUSH PRIVILEGES" in sql
251
+
252
+
253
+ # ---------------------------------------------------------------------------
254
+ # Dockerfile.web content verification
255
+ # ---------------------------------------------------------------------------
256
+
257
+
258
+ class TestDockerfileWeb:
259
+ """Verify rendered Dockerfile.web creates users from topology."""
260
+
261
+ def test_creates_users_from_topology(self, rendered_dir, parsed_spec):
262
+ dockerfile = (rendered_dir / "Dockerfile.web").read_text()
263
+ # Should have useradd for users from topology
264
+ users = parsed_spec.topology.get("users", [])
265
+ assert len(users) > 0, "Parsed spec should have users"
266
+ for user in users:
267
+ username = user.get("username", "")
268
+ if username:
269
+ assert "useradd" in dockerfile
270
+
271
+ def test_has_php_fpm(self, rendered_dir):
272
+ dockerfile = (rendered_dir / "Dockerfile.web").read_text()
273
+ assert "php8.1-fpm" in dockerfile
274
+
275
+ def test_has_nginx(self, rendered_dir):
276
+ dockerfile = (rendered_dir / "Dockerfile.web").read_text()
277
+ assert "nginx" in dockerfile
278
+
279
+ def test_copies_nginx_conf(self, rendered_dir):
280
+ dockerfile = (rendered_dir / "Dockerfile.web").read_text()
281
+ assert "COPY nginx.conf" in dockerfile
282
+
283
+ def test_exposes_ports(self, rendered_dir):
284
+ dockerfile = (rendered_dir / "Dockerfile.web").read_text()
285
+ assert "EXPOSE" in dockerfile
286
+ assert "80" in dockerfile
287
+
288
+ def test_plants_file_flags(self, rendered_dir, parsed_spec):
289
+ """Flags with file paths on web host should appear in Dockerfile."""
290
+ dockerfile = (rendered_dir / "Dockerfile.web").read_text()
291
+ for flag in parsed_spec.flags:
292
+ if flag.host == "web" and "/" in flag.path:
293
+ assert flag.value in dockerfile, (
294
+ f"Flag {flag.id} ({flag.value}) not in Dockerfile.web"
295
+ )
296
+
297
+ def test_db_flags_not_in_dockerfile(self, rendered_dir, parsed_spec):
298
+ """Flags with db: paths should NOT appear in Dockerfile.web."""
299
+ dockerfile = (rendered_dir / "Dockerfile.web").read_text()
300
+ for flag in parsed_spec.flags:
301
+ if flag.path.startswith("mysql:") or flag.path.startswith("db:"):
302
+ assert flag.value not in dockerfile, (
303
+ f"DB flag {flag.id} ({flag.value}) should not be in Dockerfile.web"
304
+ )
305
+
306
+
307
+ # ---------------------------------------------------------------------------
308
+ # iptables.rules content verification
309
+ # ---------------------------------------------------------------------------
310
+
311
+
312
+ class TestIptablesRules:
313
+ """Verify rendered iptables.rules has correct structure."""
314
+
315
+ def test_has_filter_table(self, rendered_dir):
316
+ rules = (rendered_dir / "iptables.rules").read_text()
317
+ assert "*filter" in rules
318
+ assert "COMMIT" in rules
319
+
320
+ def test_has_forward_chain(self, rendered_dir):
321
+ rules = (rendered_dir / "iptables.rules").read_text()
322
+ assert "FORWARD" in rules
323
+
324
+
325
+ # ---------------------------------------------------------------------------
326
+ # Full round-trip: files dict is preserved through parse
327
+ # ---------------------------------------------------------------------------
328
+
329
+
330
+ class TestFilesPreserved:
331
+ """Verify that files from LLM output survive the parse pipeline."""
332
+
333
+ def test_files_dict_has_web_files(self, parsed_spec):
334
+ web_files = {k: v for k, v in parsed_spec.files.items() if k.startswith("web:")}
335
+ assert len(web_files) > 0
336
+
337
+ def test_files_dict_has_sql(self, parsed_spec):
338
+ assert "db:sql" in parsed_spec.files
339
+
340
+ def test_index_php_content(self, parsed_spec):
341
+ key = "web:/var/www/portal/index.php"
342
+ assert key in parsed_spec.files
343
+ assert "Meridian Referral Portal" in parsed_spec.files[key]
344
+
345
+ def test_lookup_php_has_sqli(self, parsed_spec):
346
+ key = "web:/var/www/portal/lookup.php"
347
+ assert key in parsed_spec.files
348
+ content = parsed_spec.files[key]
349
+ # Should contain the vulnerable SQL query
350
+ assert "last_name LIKE" in content or "$last" in content
351
+
352
+ def test_compliance_report_has_flag(self, parsed_spec):
353
+ key = "web:/var/www/portal/reports/hipaa_audit.txt"
354
+ assert key in parsed_spec.files
355
+ assert "FLAG{1a2b3c4d5e6f7788}" in parsed_spec.files[key]
356
+
357
+ def test_sql_has_user_inserts(self, parsed_spec):
358
+ sql = parsed_spec.files.get("db:sql", "")
359
+ assert "dthompson" in sql
360
+ assert "kwilliams" in sql
361
+
362
+ def test_sql_has_flag_insert(self, parsed_spec):
363
+ sql = parsed_spec.files.get("db:sql", "")
364
+ assert "FLAG{9f3a2b4c5d6e7f80}" in sql
365
+
366
+ def test_files_samba_shares(self, parsed_spec):
367
+ files_entries = {k: v for k, v in parsed_spec.files.items() if k.startswith("files:")}
368
+ assert len(files_entries) > 0
369
+
370
+ def test_db_backup_script(self, parsed_spec):
371
+ key = "db:/opt/scripts/db_backup.sh"
372
+ assert key in parsed_spec.files
373
+ assert "mysqldump" in parsed_spec.files[key]
uv.lock CHANGED
@@ -1862,52 +1862,6 @@ wheels = [
1862
  { url = "https://files.pythonhosted.org/packages/a2/eb/86626c1bbc2edb86323022371c39aa48df6fd8b0a1647bc274577f72e90b/nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:5b17e2001cc0d751a5bc2c6ec6d26ad95913324a4adb86788c944f8ce9ba441f", size = 89954, upload-time = "2025-03-07T01:42:44.131Z" },
1863
  ]
1864
 
1865
- [[package]]
1866
- name = "open-range"
1867
- version = "0.1.0"
1868
- source = { editable = "." }
1869
- dependencies = [
1870
- { name = "docker" },
1871
- { name = "fastapi" },
1872
- { name = "jinja2" },
1873
- { name = "openenv-core", extra = ["core"] },
1874
- { name = "pydantic" },
1875
- { name = "pyyaml" },
1876
- { name = "uvicorn" },
1877
- ]
1878
-
1879
- [package.optional-dependencies]
1880
- builder = [
1881
- { name = "litellm" },
1882
- ]
1883
- dev = [
1884
- { name = "httpx" },
1885
- { name = "pytest" },
1886
- { name = "pytest-asyncio" },
1887
- ]
1888
- training = [
1889
- { name = "trl" },
1890
- { name = "unsloth" },
1891
- ]
1892
-
1893
- [package.metadata]
1894
- requires-dist = [
1895
- { name = "docker", specifier = ">=7.0" },
1896
- { name = "fastapi", specifier = ">=0.115" },
1897
- { name = "httpx", marker = "extra == 'dev'", specifier = ">=0.27" },
1898
- { name = "jinja2", specifier = ">=3.1" },
1899
- { name = "litellm", marker = "extra == 'builder'", specifier = ">=1.30" },
1900
- { name = "openenv-core", extras = ["core"], specifier = ">=0.2.1" },
1901
- { name = "pydantic", specifier = ">=2.0" },
1902
- { name = "pytest", marker = "extra == 'dev'", specifier = ">=8.0" },
1903
- { name = "pytest-asyncio", marker = "extra == 'dev'", specifier = ">=0.23" },
1904
- { name = "pyyaml", specifier = ">=6.0" },
1905
- { name = "trl", marker = "extra == 'training'", specifier = ">=0.8" },
1906
- { name = "unsloth", marker = "extra == 'training'" },
1907
- { name = "uvicorn", specifier = ">=0.27" },
1908
- ]
1909
- provides-extras = ["dev", "training", "builder"]
1910
-
1911
  [[package]]
1912
  name = "openai"
1913
  version = "2.26.0"
@@ -1972,6 +1926,54 @@ core = [
1972
  { name = "websockets" },
1973
  ]
1974
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1975
  [[package]]
1976
  name = "opentelemetry-api"
1977
  version = "1.40.0"
 
1862
  { url = "https://files.pythonhosted.org/packages/a2/eb/86626c1bbc2edb86323022371c39aa48df6fd8b0a1647bc274577f72e90b/nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:5b17e2001cc0d751a5bc2c6ec6d26ad95913324a4adb86788c944f8ce9ba441f", size = 89954, upload-time = "2025-03-07T01:42:44.131Z" },
1863
  ]
1864
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1865
  [[package]]
1866
  name = "openai"
1867
  version = "2.26.0"
 
1926
  { name = "websockets" },
1927
  ]
1928
 
1929
+ [[package]]
1930
+ name = "openenv-open-range"
1931
+ version = "0.1.0"
1932
+ source = { editable = "." }
1933
+ dependencies = [
1934
+ { name = "click" },
1935
+ { name = "docker" },
1936
+ { name = "fastapi" },
1937
+ { name = "jinja2" },
1938
+ { name = "openenv-core", extra = ["core"] },
1939
+ { name = "pydantic" },
1940
+ { name = "pyyaml" },
1941
+ { name = "uvicorn" },
1942
+ ]
1943
+
1944
+ [package.optional-dependencies]
1945
+ builder = [
1946
+ { name = "litellm" },
1947
+ ]
1948
+ dev = [
1949
+ { name = "httpx" },
1950
+ { name = "pytest" },
1951
+ { name = "pytest-asyncio" },
1952
+ ]
1953
+ training = [
1954
+ { name = "trl" },
1955
+ { name = "unsloth" },
1956
+ ]
1957
+
1958
+ [package.metadata]
1959
+ requires-dist = [
1960
+ { name = "click", specifier = ">=8.1" },
1961
+ { name = "docker", specifier = ">=7.0" },
1962
+ { name = "fastapi", specifier = ">=0.115.0" },
1963
+ { name = "httpx", marker = "extra == 'dev'", specifier = ">=0.27" },
1964
+ { name = "jinja2", specifier = ">=3.1" },
1965
+ { name = "litellm", marker = "extra == 'builder'", specifier = ">=1.30" },
1966
+ { name = "openenv-core", extras = ["core"], specifier = ">=0.2.1" },
1967
+ { name = "pydantic", specifier = ">=2.0.0" },
1968
+ { name = "pytest", marker = "extra == 'dev'", specifier = ">=8.0" },
1969
+ { name = "pytest-asyncio", marker = "extra == 'dev'", specifier = ">=0.23" },
1970
+ { name = "pyyaml", specifier = ">=6.0" },
1971
+ { name = "trl", marker = "extra == 'training'", specifier = ">=0.8" },
1972
+ { name = "unsloth", marker = "extra == 'training'" },
1973
+ { name = "uvicorn", specifier = ">=0.24.0" },
1974
+ ]
1975
+ provides-extras = ["dev", "training", "builder"]
1976
+
1977
  [[package]]
1978
  name = "opentelemetry-api"
1979
  version = "1.40.0"