Spaces:
Runtime error
Runtime error
Aaron Brown commited on
Commit ·
3ea4118
1
Parent(s): f549fda
Cleanup: fix bugs, remove dead code, add missing packages
Browse files- Add openenv fallback stubs in client.py (matches models.py pattern)
- Fix auth command parsing with maxsplit=3 (passwords with spaces)
- Fix reward exception silencing: log at ERROR with traceback
- Add snapshot validation: default flags/topology/task if None
- Fix shell injection in file deploy with shlex.quote()
- Remove async anti-pattern in rollout.py
- Remove duplicate src/open_range/server/Dockerfile
- Remove unused requests dependency
- Remove redundant uv install check from Dockerfile
- Add missing packages: open_range.agents, open_range.validator
- AGENTS.md +722 -0
- README.md +9 -1
- openenv.yaml +0 -2
- pyproject.toml +25 -11
- server/Dockerfile +33 -14
- server/__init__.py +2 -2
- server/app.py +9 -3
- src/open_range/builder/builder.py +366 -77
- src/open_range/cli.py +438 -0
- src/open_range/client/client.py +30 -3
- src/open_range/server/Dockerfile +0 -44
- src/open_range/server/app.py +3 -0
- src/open_range/server/environment.py +39 -29
- src/open_range/training/rollout.py +5 -7
- tests/test_apply_snapshot.py +457 -0
- tests/test_console.py +40 -26
- tests/test_parse_llm_response.py +1075 -0
- tests/test_renderer_integration.py +373 -0
- uv.lock +48 -46
AGENTS.md
ADDED
|
@@ -0,0 +1,722 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# AGENTS.md
|
| 2 |
+
|
| 3 |
+
Guidance for Codex when working on OpenRange.
|
| 4 |
+
|
| 5 |
+
## What Is OpenRange
|
| 6 |
+
|
| 7 |
+
OpenRange is a **multi-agent cybersecurity gymnasium** built on OpenEnv 0.2.1. It is the first cybersecurity environment in the OpenEnv ecosystem.
|
| 8 |
+
|
| 9 |
+
Three LLM roles operate on real Docker infrastructure:
|
| 10 |
+
|
| 11 |
+
| Role | Entry Point | What It Does |
|
| 12 |
+
|------|-------------|--------------|
|
| 13 |
+
| **Builder** (`pi_build`) | YAML manifest | Generates Dockerfiles, docker-compose, configs with planted vulns. Runs NPC traffic. Evolves range via curriculum. |
|
| 14 |
+
| **Red** (`pi_red`) | External (no access) | Attacks live containers. Rewards: flag capture, efficiency, stealth, evidence quality, anti-hallucination. |
|
| 15 |
+
| **Blue** (`pi_blue`) | Internal (monitor host) | Defends via log analysis, patching, firewalling. Rewards: detection rate, patch validity, availability, FP penalty. |
|
| 16 |
+
|
| 17 |
+
Red and Blue train **in tandem** — both agents active on the same range simultaneously.
|
| 18 |
+
Red's stealth reward is coupled to Blue's detection, creating adversarial co-evolution.
|
| 19 |
+
|
| 20 |
+
A **golden path** (the answer key) validates every generated range before training begins.
|
| 21 |
+
The golden path is generated by the Builder LLM and reviewed by the Validator LLM.
|
| 22 |
+
|
| 23 |
+
## Architecture (5 Layers)
|
| 24 |
+
|
| 25 |
+
```
|
| 26 |
+
Layer 1: YAML Manifest (human-authored topology, vulns, golden path, escalation rules)
|
| 27 |
+
|
|
| 28 |
+
Layer 2: Builder Agent (YAML -> Dockerfiles, compose, configs, NPC scripts -> docker compose up)
|
| 29 |
+
|
|
| 30 |
+
Layer 3: Validator (10-check admission pipeline: 8 mechanical + 2 LLM advisory)
|
| 31 |
+
|
|
| 32 |
+
Layer 4: OpenEnv Server (FastAPI on HF Spaces: /reset, /step, /state) + Red/Blue Operators
|
| 33 |
+
|
|
| 34 |
+
Layer 5: Training (TRL GRPOTrainer + Unsloth QLoRA) + Curriculum (escalate -> mutated YAML' -> back to Layer 1)
|
| 35 |
+
```
|
| 36 |
+
|
| 37 |
+
## Reset = Mutation (Critical Design)
|
| 38 |
+
|
| 39 |
+
**`reset()` does NOT restart the same environment.** It selects a different pre-validated
|
| 40 |
+
snapshot with different vulnerabilities. Example: a web app had XSS on episode N; after reset,
|
| 41 |
+
episode N+1 uses a snapshot with IDOR instead. The topology stays the same but the planted
|
| 42 |
+
vulnerabilities, flags, and golden path change.
|
| 43 |
+
|
| 44 |
+
This means the agent **cannot memorize** a fixed exploit chain. It must learn to **generalize**
|
| 45 |
+
across vulnerability classes.
|
| 46 |
+
|
| 47 |
+
### Snapshot Generation (Async, Between Episodes)
|
| 48 |
+
|
| 49 |
+
```
|
| 50 |
+
Builder LLM called asynchronously (background queue, NOT in reset() hot path)
|
| 51 |
+
|
|
| 52 |
+
v
|
| 53 |
+
Builder LLM generates new snapshot as STRUCTURED JSON (not prose — SWE-RL lesson):
|
| 54 |
+
- Same SnapshotBuilder protocol, different BuildContext (episode history, solve rates, weak areas)
|
| 55 |
+
- Outputs formal spec: {topology, truth_graph, vulns, golden_path, evidence_spec,
|
| 56 |
+
npc_personas, task briefings}
|
| 57 |
+
- Thin template layer renders JSON spec → actual config files (PHP, nginx.conf, etc.)
|
| 58 |
+
- This separates LLM reasoning (creative) from file formatting (mechanical)
|
| 59 |
+
|
|
| 60 |
+
v
|
| 61 |
+
Partial container restart (hot-swap modified files, restart affected services)
|
| 62 |
+
|
|
| 63 |
+
v
|
| 64 |
+
10-Check Validator Admission Pipeline (per R2E-Gym + SWE-RL lessons):
|
| 65 |
+
Mechanical checks (deterministic, no LLM):
|
| 66 |
+
1. Build + boot: docker compose up + healthchecks (all containers, all ports)
|
| 67 |
+
2. Exploitability: golden path end-to-end (each step produces expect_stdout)
|
| 68 |
+
3. Patchability: inverse mutation test — revert each vuln, its golden path step MUST fail
|
| 69 |
+
4. Evidence sufficiency: logs + SIEM alerts exist for Blue investigation
|
| 70 |
+
5. Reward grounding: rubrics produce valid scores against known scenarios
|
| 71 |
+
6. Isolation + leakage: zones enforced, no flag values in briefings
|
| 72 |
+
7. Task feasibility: tasks reference real reachable hosts, services, logs
|
| 73 |
+
8. Difficulty calibration: golden path steps within ±20% of tier target
|
| 74 |
+
LLM checks (configurable, removable):
|
| 75 |
+
9. NPC consistency: personas respond per security_awareness (LLM tests NPCs)
|
| 76 |
+
10. Realism review: scenario plausibility + briefing leakage (LLM advisory only)
|
| 77 |
+
|
|
| 78 |
+
v
|
| 79 |
+
PASS -> store in Snapshot Store (frozen, immutable, ready for reset())
|
| 80 |
+
FAIL -> Builder LLM receives error context, retries (max 3)
|
| 81 |
+
```
|
| 82 |
+
|
| 83 |
+
### Reset Flow (Fast — Draws From Pool)
|
| 84 |
+
|
| 85 |
+
```
|
| 86 |
+
reset() called by training orchestration
|
| 87 |
+
|
|
| 88 |
+
v
|
| 89 |
+
Select pre-validated snapshot from Snapshot Store
|
| 90 |
+
(strategy: latest, random, or curriculum_weighted)
|
| 91 |
+
|
|
| 92 |
+
v
|
| 93 |
+
Boot or restore snapshot containers from frozen Docker artifacts
|
| 94 |
+
|
|
| 95 |
+
v
|
| 96 |
+
Return initial RangeObservation with challenge briefing
|
| 97 |
+
(Red briefing: tiered by difficulty. Blue briefing: always minimal.)
|
| 98 |
+
```
|
| 99 |
+
|
| 100 |
+
### Why LLM-Based (Not Templates)
|
| 101 |
+
|
| 102 |
+
Templates produce predictable, shallow vulnerabilities. An LLM Builder can:
|
| 103 |
+
- **Compose novel vuln chains**: SSRF to access internal DB, then SQLi on internal endpoint
|
| 104 |
+
- **Vary attack surfaces creatively**: Different URL structures, parameter names, auth flows each episode
|
| 105 |
+
- **Generate realistic code**: Vulnerable PHP/Python/Node apps that look like real software, not CTF toy examples
|
| 106 |
+
- **Adapt to agent behavior**: If Red consistently solves SQLi easily, Builder can plant harder variants or combine with WAF rules
|
| 107 |
+
|
| 108 |
+
The Validator LLM closes the loop: it reviews the Builder's output to ensure the challenge is
|
| 109 |
+
actually solvable, properly difficult, and that the description doesn't leak the answer.
|
| 110 |
+
|
| 111 |
+
### Verifiable Rewards
|
| 112 |
+
|
| 113 |
+
Despite LLM-based generation, **all rewards are grounded in verifiable container state**:
|
| 114 |
+
- Flag: `docker exec cat <path>` — binary match, not LLM-judged
|
| 115 |
+
- Patch validity: Re-run golden path exploit command — if it fails, patch worked
|
| 116 |
+
- Service availability: HTTP healthcheck — binary up/down
|
| 117 |
+
- Hallucination: Compare submitted flag against manifest-defined flags — exact string match
|
| 118 |
+
|
| 119 |
+
The LLM generates the challenge. The rewards are verified by code execution against real infrastructure.
|
| 120 |
+
|
| 121 |
+
### Challenge Diversity (Black-Box Agents)
|
| 122 |
+
|
| 123 |
+
Agents operate **black-box** — they see briefings and environment outputs, never the truth graph.
|
| 124 |
+
|
| 125 |
+
**Red briefing** is tiered by difficulty:
|
| 126 |
+
- Tier 1: topology + vague hint ("web application with database backend, find vulnerabilities")
|
| 127 |
+
- Tier 2: topology only ("corporate network, 10 hosts, find and exploit")
|
| 128 |
+
- Tier 3+: minimal ("enterprise network, go") — forces pure recon
|
| 129 |
+
|
| 130 |
+
**Blue briefing** is always minimal: "monitor SIEM for suspicious activity, investigate, respond."
|
| 131 |
+
Blue never knows what vulnerability class was planted.
|
| 132 |
+
|
| 133 |
+
**Episode diversity** prevents memorization:
|
| 134 |
+
- Must NOT repeat same vuln class within last 3 episodes
|
| 135 |
+
- Must NOT reuse same injection point within last 5 episodes
|
| 136 |
+
- Vary approach even within same vuln class (SQLi in search vs login vs API)
|
| 137 |
+
- Language/framing of briefings varies each episode
|
| 138 |
+
|
| 139 |
+
**Progression** builds naturally via curriculum:
|
| 140 |
+
- Early: single-vuln, direct exploit (SQLi → flag)
|
| 141 |
+
- Mid: multi-vuln chains (IDOR → cred leak → DB access)
|
| 142 |
+
- Late: multi-host pivots (web → internal → management → flag)
|
| 143 |
+
- Driven by solve rates, not hardcoded episode numbers
|
| 144 |
+
|
| 145 |
+
### Red + Blue Tandem RL (Core Design)
|
| 146 |
+
|
| 147 |
+
**Both offensive and defensive agents train in tandem, not sequentially.**
|
| 148 |
+
|
| 149 |
+
```
|
| 150 |
+
Episode N:
|
| 151 |
+
Builder LLM generates mutated range (new vulns, new golden path)
|
| 152 |
+
Validator LLM + scripted checks confirm range is valid
|
| 153 |
+
|
|
| 154 |
+
Red acts: nmap -> discover services -> exploit vuln -> capture flag
|
| 155 |
+
| (Red's actions appear in container logs in real time)
|
| 156 |
+
|
|
| 157 |
+
Blue observes: log stream = NPC noise + Red's real attack actions
|
| 158 |
+
Blue acts: analyze logs -> identify attack -> patch/block -> submit findings
|
| 159 |
+
|
|
| 160 |
+
Rewards computed:
|
| 161 |
+
Red: flag + efficiency + stealth(did Blue detect?) + anti-hallucination
|
| 162 |
+
Blue: detection(did Blue catch Red?) + patch(did patch block exploit?) + availability + FP penalty
|
| 163 |
+
|
|
| 164 |
+
Both rewards feed back to their respective GRPO trainers
|
| 165 |
+
```
|
| 166 |
+
|
| 167 |
+
**Key coupling**: Red's stealth reward depends on Blue's detection. Blue's detection reward
|
| 168 |
+
depends on Red's actions. This creates an adversarial co-evolution:
|
| 169 |
+
- Red learns to be stealthier -> Blue must learn better detection
|
| 170 |
+
- Blue learns to detect faster -> Red must learn new evasion techniques
|
| 171 |
+
|
| 172 |
+
This is NOT self-play (single model playing both roles). It's **two separate policies** trained
|
| 173 |
+
against shared infrastructure with coupled reward signals.
|
| 174 |
+
|
| 175 |
+
### Vulnerability Classes (Examples)
|
| 176 |
+
|
| 177 |
+
| OWASP | Class | Example | Scope |
|
| 178 |
+
|-------|-------|---------|-------|
|
| 179 |
+
| A01 | IDOR | Sequential user IDs without authz | web API |
|
| 180 |
+
| A01 | Path Traversal | `file=` param without sanitization | web |
|
| 181 |
+
| A01 | LFI | `include($_GET['page'])` → server files | web |
|
| 182 |
+
| A01 | RFI | Remote file include → code execution | web (Tier 2+) |
|
| 183 |
+
| A01 | Missing Authz | Unprotected admin endpoint | web |
|
| 184 |
+
| A03 | SQLi | Unsanitized query parameter | web → db |
|
| 185 |
+
| A03 | XSS | Comment form → admin session hijack | web |
|
| 186 |
+
| A03 | Command Injection | User input to `os.system()` | web → shell |
|
| 187 |
+
| A03 | LDAP Injection | Unsanitized LDAP bind/search | web → ldap |
|
| 188 |
+
| A03 | SSTI | Template injection → RCE | web |
|
| 189 |
+
| A03 | XXE | XML external entity → file read / SSRF | web |
|
| 190 |
+
| A04 | File Upload | Unrestricted upload → webshell | web |
|
| 191 |
+
| A05 | Service Misconfig | Debug endpoints, default configs | any host |
|
| 192 |
+
| A07 | Weak Creds | Default passwords | SSH, DB, LDAP, SMB |
|
| 193 |
+
| A07 | Broken Auth | JWT `alg:none`, session fixation | web |
|
| 194 |
+
| A07 | Credential Reuse | Same password → lateral movement | cross-service |
|
| 195 |
+
| A07 | Kerberoasting | Kerberos ticket attacks | ldap (Tier 3+) |
|
| 196 |
+
| A08 | RCE | `eval()`, pickle, code injection | web → shell |
|
| 197 |
+
| A08 | Deserialization | Insecure deserialization | web |
|
| 198 |
+
| A10 | SSRF | URL fetch hitting internal services | web → internal |
|
| 199 |
+
| Infra | SMB Misconfig | Guest access, null sessions | files |
|
| 200 |
+
| Infra | Mail Misconfig | Open relay, missing SPF/DKIM | mail |
|
| 201 |
+
| Infra | Firewall Bypass | Zone traversal, rule gaps | firewall |
|
| 202 |
+
| Infra | SSH Key Exposure | Private keys readable | any host (Tier 2+) |
|
| 203 |
+
| Ops | Config Drift | Stale config diverged from intended | any host |
|
| 204 |
+
| Ops | Orphaned Access | Departed staff accounts | ldap |
|
| 205 |
+
| Ops | Data Exposure | Creds in backups, logs, configs | any host |
|
| 206 |
+
| T3+ | CI/CD Poisoning | Pipeline injection | ci_cd (Tier 3+) |
|
| 207 |
+
| T3+ | Supply Chain | Dependency confusion | ci_cd (Tier 3+) |
|
| 208 |
+
| Chain | Multi-host | SSRF → internal SQLi → flag in DB | cross-zone |
|
| 209 |
+
| Chain | Lateral | Credential reuse → SSH pivot → LDAP dump | cross-service |
|
| 210 |
+
|
| 211 |
+
### Implications for Training
|
| 212 |
+
|
| 213 |
+
- **Reset latency**: LLM generation (~10-20s) + container update (~10-20s) + LLM validation (~10-15s) + scripted validation (~5-10s) = ~35-65s per reset
|
| 214 |
+
- **GRPO batching**: All `num_generations` in a batch share the SAME mutated range (reset once per batch, not per generation)
|
| 215 |
+
- **Episode diversity**: LLM generates genuinely novel challenges each reset — not cycling through fixed templates
|
| 216 |
+
- **Container cleanup**: After each episode, dirty state cleaned by restarting affected service containers
|
| 217 |
+
- **Tandem training**: Red and Blue GRPO trainers can run on same or different GPUs, sharing environment
|
| 218 |
+
- **Curriculum**: As both agents improve, Builder LLM generates harder challenges (more hosts, chained vulns, stealthier golden paths)
|
| 219 |
+
|
| 220 |
+
## Lessons from Research (R2E-Gym, Self-Play SWE-RL)
|
| 221 |
+
|
| 222 |
+
These papers directly inform OpenRange's design. Violating these lessons risks repeating known failures.
|
| 223 |
+
|
| 224 |
+
### From R2E-Gym (Procedural Environments + Hybrid Verifiers)
|
| 225 |
+
|
| 226 |
+
1. **Hybrid verification is non-negotiable.** Execution-based verification alone plateaus at ~43%. LLM-based verification alone plateaus at ~43%. Combined: 51%. OpenRange's Validator MUST use both LLM review AND scripted golden-path execution.
|
| 227 |
+
|
| 228 |
+
2. **Synthetic task generation equals human quality.** LLM-generated task descriptions perform identically to human-written ones (27.8% vs 28.0%). Builder LLM generating cyber challenges from vulnerability catalogs is a validated approach.
|
| 229 |
+
|
| 230 |
+
3. **Toxic tests are real.** Up to 10% of generated validations incorrectly favor wrong solutions. Track Validator false-positive rate (accepting broken ranges) and false-negative rate (rejecting valid ranges).
|
| 231 |
+
|
| 232 |
+
4. **Include reasoning traces in training data.** SFT with agent thought processes improves downstream performance by +3.8%. Red and Blue training trajectories MUST include structured reasoning (recon plan → vuln hypothesis → exploit attempt → verification), not just raw commands.
|
| 233 |
+
|
| 234 |
+
5. **Build environment creation is the hardest part.** Docker dependency resolution, service connectivity, and reproducibility dominate engineering effort. Pre-build base images extensively.
|
| 235 |
+
|
| 236 |
+
### From Self-Play SWE-RL (Adversarial Self-Improvement)
|
| 237 |
+
|
| 238 |
+
6. **Formal specifications beat natural language.** Their biggest failed experiment: generating NL issue descriptions. A 32B model produced incoherent, repetitive text. They succeeded with formal test specs. **Builder LLM should output structured JSON specs** (vuln_type, injection_point, golden_path_commands, flag_location), NOT prose. The challenge description for the AGENT can be NL, but the Builder's internal output must be formal.
|
| 239 |
+
|
| 240 |
+
7. **Builder reward: `r_inject = 1 - (1+α)·s`** where s = solve rate, α = 0.8. Penalizes too-easy challenges (s→1) and too-hard/impossible ones (s→0). Rewards challenges at the frontier of the agent's current ability. This naturally creates curriculum without manual difficulty design.
|
| 241 |
+
|
| 242 |
+
8. **7-check consistency validation with inverse mutation testing.** Every generated range must pass:
|
| 243 |
+
- Services exist and respond
|
| 244 |
+
- Flags are accessible at expected locations
|
| 245 |
+
- Vulnerability is actually exploitable (golden path succeeds)
|
| 246 |
+
- Network isolation holds
|
| 247 |
+
- Difficulty matches target
|
| 248 |
+
- Challenge description doesn't leak the answer
|
| 249 |
+
- **Inverse mutation test**: for each planted vuln, removing ONLY that vuln must cause the golden path to fail at the corresponding step. This verifies each vuln actually contributes to the challenge.
|
| 250 |
+
|
| 251 |
+
9. **Higher-order challenges from failed attempts.** When Blue fails to patch a vuln, the resulting state (partial patch + remaining vuln) becomes a harder challenge for the next episode. When Red fails to exploit, the failed attempt reveals what didn't work, informing the Builder to create challenges that specifically test that weakness.
|
| 252 |
+
|
| 253 |
+
10. **Collapse risks in adversarial training.** A sufficiently capable Red agent can learn dominant strategies (e.g., obfuscation, always-same-attack) that stall Blue learning. Mitigations: ground in real-world data (real CVE patterns), limit divergence from realistic attack patterns, don't let Red game the reward through unrealistic strategies.
|
| 254 |
+
|
| 255 |
+
11. **SFT before RL is critical.** Both papers use SFT on expert trajectories first, then RL. Never start GRPO from a cold model — always warm-start with supervised fine-tuning on successful attack/defense traces.
|
| 256 |
+
|
| 257 |
+
12. **Binary reward for solver, nuanced reward for generator.** Red/Blue can use binary rewards (flag found or not, attack detected or not). The Builder needs the frontier-calibrating `r_inject` reward to learn optimal difficulty.
|
| 258 |
+
|
| 259 |
+
## Key Invariants
|
| 260 |
+
|
| 261 |
+
- **Golden path gates training**: No episode runs on unvalidated infrastructure. Validator must PASS all 10 admission checks (8 mechanical + 2 LLM).
|
| 262 |
+
- **Rewards are grounded**: Every reward signal verified against golden-path-validated container state (flags via `docker exec`, patches via re-running exploit chain).
|
| 263 |
+
- **Anti-hallucination**: Flag submissions checked against manifest-defined flags. Fake flags penalized at -0.3.
|
| 264 |
+
- **Agents cannot reset**: Only training orchestration controls episode lifecycle (inherited from OpenEnv).
|
| 265 |
+
- **Horizontal growth, not vertical**: Difficulty increases by adding hosts/networks/services, not just harder passwords.
|
| 266 |
+
- **NPC noise is mandatory for Blue**: Without background traffic, detection is trivial and stealth is meaningless. NPCs evolve from shell-script noise (Level 0) to LLM-driven personas with susceptibility profiles (Level 1+), creating a social engineering attack surface.
|
| 267 |
+
- **Client-server separation**: Follows OpenEnv pattern — clients never import from `server/`.
|
| 268 |
+
|
| 269 |
+
## Directory Structure
|
| 270 |
+
|
| 271 |
+
```
|
| 272 |
+
open-range/
|
| 273 |
+
├── AGENTS.md # This file
|
| 274 |
+
├── IMPLEMENTATION_PLAN.md # Build plan, testing, open questions
|
| 275 |
+
├── manifests/ # YAML range manifests (human-authored)
|
| 276 |
+
│ ├── schema.yaml # JSON Schema for manifest validation
|
| 277 |
+
│ ├── tier1_basic.yaml # 8-host enterprise, ~8 golden path steps
|
| 278 |
+
│ ├── tier2_corporate.yaml # 10-12 host, ~15 golden path steps
|
| 279 |
+
│ └── tier3_enterprise.yaml # 14-18 host, ~25 golden path steps
|
| 280 |
+
├── protocols.py # Agent protocols (SnapshotBuilder, NPCBehavior, ValidatorCheck)
|
| 281 |
+
├── resolve.py # Dynamic component resolution (importlib + Protocol check)
|
| 282 |
+
├── builder/ # Builder agent (Layer 2)
|
| 283 |
+
│ ├── builder.py # LLMSnapshotBuilder + TemplateOnlyBuilder + FileBuilder
|
| 284 |
+
│ ├── mutator.py # Vuln mutation logic (swap vulns between resets)
|
| 285 |
+
│ ├── templates/ # Jinja2 templates for Dockerfiles, configs
|
| 286 |
+
│ └── npc/ # NPC system (Level 0: shell scripts, Level 1: LLM personas)
|
| 287 |
+
│ ├── npc_manager.py # Orchestrator: starts scripts + LLM agents per snapshot
|
| 288 |
+
│ ├── persona.py # Pydantic NPC persona model (security_awareness, susceptibility)
|
| 289 |
+
│ ├── npc_agent.py # Async LLM NPC agent loop (email check, decide, act)
|
| 290 |
+
│ ├── http_traffic.sh # Level 0: curl loops
|
| 291 |
+
│ ├── smtp_traffic.sh # Level 0: email noise
|
| 292 |
+
│ └── *.sh # Level 0: other service traffic scripts
|
| 293 |
+
├── validator/ # Golden path validator (Layer 3) — 10-check admission pipeline
|
| 294 |
+
│ ├── validator.py # Validator pipeline (runs list of ValidatorCheck protocols)
|
| 295 |
+
│ ├── build_boot.py # Check 1: docker compose up + healthchecks (mechanical)
|
| 296 |
+
│ ├── exploitability.py # Check 2: golden path end-to-end (mechanical)
|
| 297 |
+
│ ├── patchability.py # Check 3: inverse mutation test (mechanical)
|
| 298 |
+
│ ├── evidence.py # Check 4: logs + alerts exist (mechanical)
|
| 299 |
+
│ ├── reward_grounding.py # Check 5: rubrics produce valid scores (mechanical)
|
| 300 |
+
│ ├── isolation.py # Check 6: zones enforced, no leaks (mechanical)
|
| 301 |
+
│ ├── task_feasibility.py # Check 7: tasks reference real reachable hosts/services/logs (mechanical)
|
| 302 |
+
│ ├── difficulty.py # Check 8: golden path steps within ±20% of tier target (mechanical)
|
| 303 |
+
│ ├── npc_consistency.py # Check 9: NPC personas respond per security_awareness (LLM)
|
| 304 |
+
│ └── realism_review.py # Check 10: scenario plausibility + briefing leakage (LLM, advisory)
|
| 305 |
+
├── server/ # OpenEnv server (Layer 4)
|
| 306 |
+
│ ├── app.py # FastAPI application (create_app)
|
| 307 |
+
│ ├── environment.py # CyberRange Environment subclass
|
| 308 |
+
│ ├── models.py # RangeAction, RangeObservation, RangeState
|
| 309 |
+
│ ├── rewards.py # Reward components (flag, stealth, detect, etc.)
|
| 310 |
+
│ ├── Dockerfile # Container for HF Spaces deployment
|
| 311 |
+
│ └── requirements.txt
|
| 312 |
+
├── client/ # OpenEnv client (typed)
|
| 313 |
+
│ ├── __init__.py
|
| 314 |
+
│ └── client.py # OpenRangeEnv(EnvClient) or MCPToolClient
|
| 315 |
+
├── training/ # Training scripts (DEFERRED — environment-first)
|
| 316 |
+
│ ├── rollout.py # rollout_func for GRPOTrainer (OpenEnv integration point)
|
| 317 |
+
│ └── curriculum.py # Phi: escalation logic, YAML mutation
|
| 318 |
+
├── scripts/ # Utility scripts
|
| 319 |
+
│ ├── deploy_hf.sh # Deploy to HF Spaces
|
| 320 |
+
│ └── run_local.sh # Local development runner
|
| 321 |
+
├── tests/ # Test suite
|
| 322 |
+
│ ├── test_manifest.py # Schema validation tests
|
| 323 |
+
│ ├── test_validator.py # Golden path validation tests
|
| 324 |
+
│ ├── test_environment.py # OpenEnv server tests
|
| 325 |
+
│ ├── test_rewards.py # Reward component tests
|
| 326 |
+
│ └── test_integration.py # End-to-end integration tests
|
| 327 |
+
├── pyproject.toml
|
| 328 |
+
└── README.md
|
| 329 |
+
```
|
| 330 |
+
|
| 331 |
+
## OpenEnv Compatibility (EXACT API Contract)
|
| 332 |
+
|
| 333 |
+
OpenRange follows the OpenEnv 0.2.x environment pattern. Reference implementations:
|
| 334 |
+
`envs/coding_env/` (command execution) and `envs/echo_env/` (MCP tools).
|
| 335 |
+
|
| 336 |
+
### Base Classes (from `openenv.core.env_server.types`)
|
| 337 |
+
|
| 338 |
+
```python
|
| 339 |
+
# Action base: extra="forbid" (rejects unknown fields)
|
| 340 |
+
class Action(BaseModel):
|
| 341 |
+
metadata: Dict[str, Any] = {}
|
| 342 |
+
|
| 343 |
+
# Observation base: extra="forbid", already has done + reward
|
| 344 |
+
class Observation(BaseModel):
|
| 345 |
+
done: bool = False
|
| 346 |
+
reward: bool | int | float | None = None
|
| 347 |
+
metadata: Dict[str, Any] = {}
|
| 348 |
+
|
| 349 |
+
# State base: extra="allow" (allows additional fields)
|
| 350 |
+
class State(BaseModel):
|
| 351 |
+
episode_id: Optional[str] = None
|
| 352 |
+
step_count: int = 0
|
| 353 |
+
```
|
| 354 |
+
|
| 355 |
+
### OpenRange Models (`server/models.py`)
|
| 356 |
+
|
| 357 |
+
```python
|
| 358 |
+
from openenv.core.env_server.types import Action, Observation, State
|
| 359 |
+
|
| 360 |
+
class RangeAction(Action):
|
| 361 |
+
command: str # Shell command or tool invocation
|
| 362 |
+
mode: Literal["red", "blue"] # Which operator is acting
|
| 363 |
+
|
| 364 |
+
class RangeObservation(Observation):
|
| 365 |
+
# NOTE: done and reward are INHERITED from Observation base — do NOT redeclare
|
| 366 |
+
stdout: str = "" # Command output
|
| 367 |
+
stderr: str = "" # Error output
|
| 368 |
+
flags_captured: list[str] = []
|
| 369 |
+
alerts: list[str] = [] # Blue: IDS/log alerts
|
| 370 |
+
|
| 371 |
+
class RangeState(State):
|
| 372 |
+
# NOTE: episode_id and step_count are INHERITED from State base
|
| 373 |
+
mode: str = "" # Current active mode (red/blue)
|
| 374 |
+
flags_found: list[str] = []
|
| 375 |
+
services_status: dict = {}
|
| 376 |
+
tier: int = 1
|
| 377 |
+
```
|
| 378 |
+
|
| 379 |
+
### Environment (`server/environment.py`)
|
| 380 |
+
|
| 381 |
+
```python
|
| 382 |
+
from openenv.core.env_server.interfaces import Environment
|
| 383 |
+
|
| 384 |
+
class RangeEnvironment(Environment[RangeAction, RangeObservation, RangeState]):
|
| 385 |
+
SUPPORTS_CONCURRENT_SESSIONS = False # One episode per range instance
|
| 386 |
+
|
| 387 |
+
def __init__(self):
|
| 388 |
+
super().__init__() # Can pass transform= and rubric= here
|
| 389 |
+
self._state = RangeState()
|
| 390 |
+
|
| 391 |
+
def reset(self, seed: Optional[int] = None,
|
| 392 |
+
episode_id: Optional[str] = None, **kwargs) -> RangeObservation:
|
| 393 |
+
# Trigger Builder LLM mutation + Validator
|
| 394 |
+
# Clear episode state
|
| 395 |
+
self._state = RangeState(episode_id=episode_id or str(uuid4()))
|
| 396 |
+
return RangeObservation(stdout="Range ready. Begin reconnaissance.")
|
| 397 |
+
|
| 398 |
+
def step(self, action: RangeAction,
|
| 399 |
+
timeout_s: Optional[float] = None, **kwargs) -> RangeObservation:
|
| 400 |
+
# Route action.command to container via docker exec
|
| 401 |
+
# Compute reward via rubric
|
| 402 |
+
self._state.step_count += 1
|
| 403 |
+
obs = RangeObservation(stdout=result, stderr=err)
|
| 404 |
+
obs.reward = self._apply_rubric(action, obs) # Uses Rubric if set
|
| 405 |
+
return obs
|
| 406 |
+
|
| 407 |
+
@property
|
| 408 |
+
def state(self) -> RangeState:
|
| 409 |
+
return self._state
|
| 410 |
+
```
|
| 411 |
+
|
| 412 |
+
### App Factory (`server/app.py`)
|
| 413 |
+
|
| 414 |
+
```python
|
| 415 |
+
from openenv.core.env_server import create_app
|
| 416 |
+
from server.models import RangeAction, RangeObservation
|
| 417 |
+
from server.environment import RangeEnvironment
|
| 418 |
+
|
| 419 |
+
# MUST pass CLASS (not instance) — enables WebSocket session isolation
|
| 420 |
+
app = create_app(RangeEnvironment, RangeAction, RangeObservation,
|
| 421 |
+
env_name="open_range")
|
| 422 |
+
```
|
| 423 |
+
|
| 424 |
+
### Client (`client/client.py`)
|
| 425 |
+
|
| 426 |
+
```python
|
| 427 |
+
from openenv.core.env_client import EnvClient
|
| 428 |
+
from openenv.core.client_types import StepResult
|
| 429 |
+
|
| 430 |
+
class OpenRangeEnv(EnvClient[RangeAction, RangeObservation, RangeState]):
|
| 431 |
+
def _step_payload(self, action: RangeAction) -> dict:
|
| 432 |
+
return {"command": action.command, "mode": action.mode}
|
| 433 |
+
|
| 434 |
+
def _parse_result(self, payload: dict) -> StepResult[RangeObservation]:
|
| 435 |
+
obs = RangeObservation(**payload["observation"])
|
| 436 |
+
return StepResult(
|
| 437 |
+
observation=obs,
|
| 438 |
+
reward=payload.get("reward"),
|
| 439 |
+
done=bool(payload.get("done", False)),
|
| 440 |
+
)
|
| 441 |
+
|
| 442 |
+
def _parse_state(self, payload: dict) -> RangeState:
|
| 443 |
+
return RangeState(**payload)
|
| 444 |
+
```
|
| 445 |
+
|
| 446 |
+
### Endpoints (auto-provided by `create_app`)
|
| 447 |
+
|
| 448 |
+
```
|
| 449 |
+
GET /health → {"status": "healthy"}
|
| 450 |
+
GET /metadata → environment name, version, description
|
| 451 |
+
GET /schema → JSON schemas for action, observation, state
|
| 452 |
+
POST /reset → initial RangeObservation
|
| 453 |
+
POST /step → RangeObservation after executing action
|
| 454 |
+
GET /state → current RangeState
|
| 455 |
+
WS /ws → persistent WebSocket session
|
| 456 |
+
```
|
| 457 |
+
|
| 458 |
+
### MCP Alternative
|
| 459 |
+
|
| 460 |
+
For richer tool discovery, OpenRange can also use `MCPEnvironment` with FastMCP tools:
|
| 461 |
+
```python
|
| 462 |
+
class RangeEnvironment(MCPEnvironment):
|
| 463 |
+
def __init__(self):
|
| 464 |
+
mcp = FastMCP("open_range")
|
| 465 |
+
|
| 466 |
+
@mcp.tool
|
| 467 |
+
def nmap_scan(target: str, flags: str = "-sV") -> str:
|
| 468 |
+
"""Run nmap port scan against target"""
|
| 469 |
+
return docker_exec(f"nmap {flags} {target}")
|
| 470 |
+
|
| 471 |
+
@mcp.tool
|
| 472 |
+
def submit_flag(flag: str) -> str:
|
| 473 |
+
"""Submit a captured flag for verification"""
|
| 474 |
+
return verify_flag(flag)
|
| 475 |
+
|
| 476 |
+
super().__init__(mcp)
|
| 477 |
+
```
|
| 478 |
+
This exposes tools via `ListToolsAction`/`CallToolAction` + MCP JSON-RPC at `/mcp`.
|
| 479 |
+
|
| 480 |
+
## Reward Components
|
| 481 |
+
|
| 482 |
+
### Red Operator
|
| 483 |
+
| Signal | Weight | Source |
|
| 484 |
+
|--------|--------|--------|
|
| 485 |
+
| `r_flag` | Binary | `docker exec cat <path>` at golden-path location |
|
| 486 |
+
| `r_efficiency` | `gamma^t` | Step count discount |
|
| 487 |
+
| `r_stealth` | Negative | IDS log entry count |
|
| 488 |
+
| `r_evidence` | 0-1 | Documentation quality of findings |
|
| 489 |
+
| `r_halluc` | -0.3 | Per fake flag (compared against manifest) |
|
| 490 |
+
| `r_social` | Binary | NPC fell for phish/pretext (Level 1+ only) |
|
| 491 |
+
| `r_complexity` | tier_mult | `tier_multiplier * base_reward` (1.0x-3.0x, scales with snapshot complexity) |
|
| 492 |
+
|
| 493 |
+
### Blue Operator
|
| 494 |
+
| Signal | Weight | Source |
|
| 495 |
+
|--------|--------|--------|
|
| 496 |
+
| `r_detect` | 0-1 | TP rate (Red golden-path steps in logs) |
|
| 497 |
+
| `r_patch` | Binary | Validator re-runs Red exploit -> fails = valid patch |
|
| 498 |
+
| `r_avail` | 0-1 | Services still responding (healthchecks) |
|
| 499 |
+
| `r_FP` | -0.2 | Per false alarm (NPC traffic flagged as attack) |
|
| 500 |
+
| `r_phish_detect` | 0-1 | Correctly identified social engineering in logs (Level 1+ only) |
|
| 501 |
+
| `r_complexity` | tier_mult | `tier_multiplier * base_reward` (1.0x-3.0x, scales with snapshot complexity) |
|
| 502 |
+
|
| 503 |
+
## Tier System (Horizontal Growth)
|
| 504 |
+
|
| 505 |
+
Each tier is a **fully integrated network** — services connect to each other, web apps talk to
|
| 506 |
+
databases, auth systems protect resources, logs flow to monitoring. Not isolated containers.
|
| 507 |
+
|
| 508 |
+
| Tier | Hosts | Networks | Integrated Services | Identity/Auth | Golden Steps |
|
| 509 |
+
|------|-------|----------|---------------------|---------------|--------------|
|
| 510 |
+
| 1 | attacker, firewall, web, mail, db, files, ldap, siem (8) | external, dmz, internal, mgmt | nginx+PHP web app → MySQL, postfix/dovecot, samba, OpenLDAP, rsyslog SIEM, iptables firewall | DB + LDAP user auth, session cookies | ~8 |
|
| 511 |
+
| 2 | + jumpbox, vpn (10-12) | + guest, vpn | + SSH bastion, OpenVPN, cron jobs | + SSH key auth, VPN cert auth, email-based password reset | ~15 |
|
| 512 |
+
| 3 | + CI/CD, dev-tools (14-18) | + partner, dev | + Jenkins/GitLab runner, dev endpoints | + AD/LDAP auth, Kerberos tickets, service accounts | ~25 |
|
| 513 |
+
| 4 | + OT/SCADA, cloud-proxy (20-25) | + OT, cloud | + Modbus/OPC-UA simulators, cloud gateway | + jump host required for OT, credential rotation, MFA | ~35 |
|
| 514 |
+
| 5 | + honeypots, WAF (30+) | + trap net | + decoy services, WAF, IDS, threat intel | + honeypot tokens, rate limiting, cert-based auth | ~50 |
|
| 515 |
+
|
| 516 |
+
### How Services Integrate (Tier 1 — 8 Containers)
|
| 517 |
+
|
| 518 |
+
```
|
| 519 |
+
[attacker] (external zone)
|
| 520 |
+
|
|
| 521 |
+
| port 80, 443, 25 only via firewall
|
| 522 |
+
v
|
| 523 |
+
[firewall] (perimeter) — iptables, NAT, zone enforcement, logs to siem
|
| 524 |
+
|
|
| 525 |
+
v
|
| 526 |
+
[web.corp.local] (DMZ 10.0.1.0/24) nginx + PHP web app
|
| 527 |
+
| - Login form -> authenticates against ldap (LDAP bind)
|
| 528 |
+
| - Product search -> SQL query to db (vuln injection point)
|
| 529 |
+
| - File upload -> stored on disk (vuln injection point)
|
| 530 |
+
| - All access logged to /var/log/nginx/access.log -> siem
|
| 531 |
+
|
|
| 532 |
+
├──> [mail.corp.local] (DMZ) postfix + dovecot
|
| 533 |
+
| - User lookup against ldap
|
| 534 |
+
| - NPC email traffic + social engineering surface
|
| 535 |
+
| - Logs to siem
|
| 536 |
+
|
|
| 537 |
+
| port 3306 (internal only)
|
| 538 |
+
v
|
| 539 |
+
[db.corp.local] (internal 10.0.2.0/24) MySQL
|
| 540 |
+
| - users, products, flags tables
|
| 541 |
+
| - Query logs -> siem
|
| 542 |
+
|
|
| 543 |
+
[files.corp.local] (internal) samba
|
| 544 |
+
| - SMB shares, access via ldap auth
|
| 545 |
+
| - Logs to siem
|
| 546 |
+
|
|
| 547 |
+
[ldap.corp.local] (mgmt 10.0.3.0/24) OpenLDAP + Kerberos
|
| 548 |
+
| - Central auth for all services
|
| 549 |
+
| - Audit replication to siem
|
| 550 |
+
|
|
| 551 |
+
[siem.corp.local] (mgmt) rsyslog + log aggregation
|
| 552 |
+
- Blue's entry point — reads ALL logs here
|
| 553 |
+
- NPC traffic mixed with real attack traffic
|
| 554 |
+
- Blue reads logs, never touches web/db/files directly
|
| 555 |
+
```
|
| 556 |
+
|
| 557 |
+
## Agent Tool Philosophy: Container-as-Constraint
|
| 558 |
+
|
| 559 |
+
**No artificial allowlists.** Agents can run ANY command available in their container.
|
| 560 |
+
The Docker image defines what's possible — not code-level filtering.
|
| 561 |
+
|
| 562 |
+
### How Commands Execute
|
| 563 |
+
|
| 564 |
+
```
|
| 565 |
+
Agent sends: RangeAction(command="nmap -sV 10.0.1.0/24", mode="red")
|
| 566 |
+
↓
|
| 567 |
+
environment.step() routes by mode:
|
| 568 |
+
Red → docker exec open-range-attacker-1 sh -c "nmap -sV 10.0.1.0/24"
|
| 569 |
+
Blue → docker exec open-range-siem-1 sh -c "..."
|
| 570 |
+
↓
|
| 571 |
+
Raw stdout/stderr returned as RangeObservation
|
| 572 |
+
```
|
| 573 |
+
|
| 574 |
+
No validation, sanitization, or allowlisting. The command string goes straight to `sh -c`.
|
| 575 |
+
|
| 576 |
+
### What's Installed (Tier 1)
|
| 577 |
+
|
| 578 |
+
**Red (Kali)**: nmap, sqlmap, hydra, nikto, smbclient, curl, wget, netcat, ssh,
|
| 579 |
+
dnsutils, tcpdump, python3+pip. Plus all standard Kali/Debian tools. Agents can
|
| 580 |
+
`pip install` or `apt install` additional tools at runtime.
|
| 581 |
+
|
| 582 |
+
**Blue (SIEM)**: rsyslog, grep/awk/sed, jq, curl, ssh. All logs aggregated at
|
| 583 |
+
`/var/log/siem/consolidated/all.log`. Agents can write custom scripts, parse JSON,
|
| 584 |
+
correlate events — whatever Unix tools allow.
|
| 585 |
+
|
| 586 |
+
### Meta-Commands (Handled by Environment, Not Containers)
|
| 587 |
+
|
| 588 |
+
These are intercepted before docker exec:
|
| 589 |
+
|
| 590 |
+
| Command | Role | Effect |
|
| 591 |
+
|---------|------|--------|
|
| 592 |
+
| `submit_flag <value>` | Red | Validates against snapshot flags; -0.3 penalty per hallucinated flag |
|
| 593 |
+
| `submit_evidence <json>` | Red | Logs findings for evidence reward scoring |
|
| 594 |
+
| `submit_finding <desc>` | Blue | Logs attack detection for accuracy scoring |
|
| 595 |
+
| `auth <host> <user> <pass>` | Both | Validates creds against snapshot topology |
|
| 596 |
+
| `logout <host>` | Both | Terminates active session |
|
| 597 |
+
|
| 598 |
+
### What Agents Should NOT Be Told
|
| 599 |
+
|
| 600 |
+
Agent prompts should NOT enumerate allowed tools. Instead:
|
| 601 |
+
- Red: "You have a Kali workstation. Run any command."
|
| 602 |
+
- Blue: "You have the SIEM console. Use any tool to investigate."
|
| 603 |
+
|
| 604 |
+
The agent discovers what's available through reconnaissance (e.g., `which sqlmap`,
|
| 605 |
+
`ls /usr/bin/`, `pip list`). This mirrors real pentesting and SOC work.
|
| 606 |
+
|
| 607 |
+
### Docker Network Topology (Tier 1)
|
| 608 |
+
|
| 609 |
+
```
|
| 610 |
+
attacker (10.0.0.2) → firewall (10.0.0.3/10.0.1.2) → web (10.0.1.4)
|
| 611 |
+
NAT + iptables → mail (10.0.1.3)
|
| 612 |
+
→ db (10.0.2.x)
|
| 613 |
+
→ files (10.0.2.x)
|
| 614 |
+
→ ldap (10.0.3.x)
|
| 615 |
+
→ siem (10.0.3.x)
|
| 616 |
+
```
|
| 617 |
+
|
| 618 |
+
Attacker routes to DMZ/internal/mgmt via firewall. Only ports 80, 443, 25 pass
|
| 619 |
+
from external→DMZ. The firewall enforces zone segmentation per manifest rules.
|
| 620 |
+
|
| 621 |
+
## Builder LLM Schema Alignment (IMPORTANT)
|
| 622 |
+
|
| 623 |
+
The Builder prompt schema and the Pydantic models MUST match field names.
|
| 624 |
+
Mismatches cause `ValidationError` at parse time. Known mappings handled
|
| 625 |
+
by `_parse_llm_response()` in `builder/builder.py`:
|
| 626 |
+
|
| 627 |
+
| Prompt Schema | Pydantic Model | Parser Handles |
|
| 628 |
+
|---------------|----------------|----------------|
|
| 629 |
+
| `exploit_chain[].vuln` | `ExploitStep.vuln_id` | Yes |
|
| 630 |
+
| `exploit_chain[].action` | `ExploitStep.command` | Yes |
|
| 631 |
+
| `exploit_chain[].yields` | `ExploitStep.description` | Yes |
|
| 632 |
+
| `golden_path[].cmd` | `GoldenPathStep.command` | Yes |
|
| 633 |
+
| `golden_path[].expect_stdout` | `GoldenPathStep.expect_in_stdout` | Yes |
|
| 634 |
+
| `accounts.smb_shares` (list) | `NPCPersona.accounts` (dict[str, Any]) | Yes |
|
| 635 |
+
| `evidence_spec` (dict) | `list[EvidenceItem]` | Yes |
|
| 636 |
+
|
| 637 |
+
**Rule**: When adding new fields to SnapshotSpec or its children, update BOTH
|
| 638 |
+
the builder prompt schema AND the `_parse_llm_response()` mapper. If the LLM
|
| 639 |
+
returns a different field name, add a fallback in the parser like
|
| 640 |
+
`ec.get("vuln_id", ec.get("vuln", ""))`.
|
| 641 |
+
|
| 642 |
+
## Azure OpenAI Configuration
|
| 643 |
+
|
| 644 |
+
For LLM builder/validator, set these env vars:
|
| 645 |
+
|
| 646 |
+
```bash
|
| 647 |
+
export AZURE_API_KEY="..."
|
| 648 |
+
export AZURE_API_BASE="https://<endpoint>.cognitiveservices.azure.com"
|
| 649 |
+
export AZURE_API_VERSION="2025-04-01-preview"
|
| 650 |
+
export OPENRANGE_BUILDER_MODEL="azure/gpt-5.2" # or any azure/<deployment>
|
| 651 |
+
```
|
| 652 |
+
|
| 653 |
+
LiteLLM reads these automatically. Model format: `azure/<deployment_name>`.
|
| 654 |
+
|
| 655 |
+
## Build & Development Commands
|
| 656 |
+
|
| 657 |
+
```bash
|
| 658 |
+
# Install dependencies
|
| 659 |
+
uv sync --all-extras
|
| 660 |
+
|
| 661 |
+
# Run tests (549 tests)
|
| 662 |
+
uv run pytest tests/ -v --tb=short
|
| 663 |
+
|
| 664 |
+
# Run OpenEnv server locally (mock mode, no Docker needed)
|
| 665 |
+
uv run uvicorn open_range.server.app:app --host 0.0.0.0 --port 8000
|
| 666 |
+
|
| 667 |
+
# Run demo episode (no Docker, no LLM)
|
| 668 |
+
uv run python examples/demo.py
|
| 669 |
+
|
| 670 |
+
# Build and start full Docker range stack (9 containers)
|
| 671 |
+
docker compose build && docker compose up -d
|
| 672 |
+
|
| 673 |
+
# Test LLM builder with Azure creds
|
| 674 |
+
uv run python scripts/test_tier1_llm.py
|
| 675 |
+
|
| 676 |
+
# Deploy to HF Spaces
|
| 677 |
+
bash scripts/deploy_hf.sh
|
| 678 |
+
```
|
| 679 |
+
|
| 680 |
+
### Docker Gotchas (Apple Silicon / ARM64)
|
| 681 |
+
|
| 682 |
+
- MySQL 5.7 has NO ARM64 images. Use `mysql:8.0` in docker-compose.yml.
|
| 683 |
+
- PHP-FPM socket: Ubuntu 22.04 installs as `php8.1-fpm`, socket at
|
| 684 |
+
`/run/php/php8.1-fpm.sock` (not generic `/run/php/php-fpm.sock`).
|
| 685 |
+
- Attacker container needs `cap_add: [NET_ADMIN]` + `iproute2` to add
|
| 686 |
+
routes to DMZ/internal/mgmt subnets via the firewall gateway.
|
| 687 |
+
- Container names follow Docker Compose convention: `open-range-<service>-1`.
|
| 688 |
+
The environment resolves these via `_container_name()` discovery.
|
| 689 |
+
|
| 690 |
+
## Key References
|
| 691 |
+
|
| 692 |
+
- **OpenEnv**: `../References/OpenEnv/` (full reference repo)
|
| 693 |
+
- **OpenEnv coding_env**: Pattern to follow for server/client structure
|
| 694 |
+
- **OpenEnv RFC 001**: Agent vs Environment boundary (MCP + HTTP duality)
|
| 695 |
+
- **OpenEnv RFC 004**: Rubric system for composable rewards
|
| 696 |
+
- **R2E-Gym**: `../References/R2E-Gym/` (full codebase) + `../2504.07164v1.pdf` (paper). Procedural env generation via backtranslation, hybrid verifiers (execution + LLM), 8.1K executable tasks. Key lesson: hybrid verification breaks through single-method plateaus.
|
| 697 |
+
- **Self-Play SWE-RL**: `../2512.18552v1.pdf`. Bug-injector + bug-solver self-play with shared weights. Key lessons: formal specs > NL, 7-check consistency validation, inverse mutation testing, frontier-calibrating Builder reward `r_inject = 1-(1+α)s`, higher-order challenges from failed attempts.
|
| 698 |
+
- **CyBench** (ICLR'25): CTF benchmark (saturating, static)
|
| 699 |
+
- **CVE-Bench** (ICML'25): Reward hacking lesson (agents gamed shortcuts)
|
| 700 |
+
- **CybORG CAGE 4**: Red/Blue/Green agent model
|
| 701 |
+
|
| 702 |
+
## Hackathon Scope & Priority
|
| 703 |
+
|
| 704 |
+
### CORE (must ship — the OpenEnv environment)
|
| 705 |
+
1. **Manifest schema** + example YAML manifests with golden paths
|
| 706 |
+
2. **Builder LLM** — generates/mutates range infrastructure from manifest (structured JSON → templates → Docker)
|
| 707 |
+
3. **Validator** — hybrid LLM review + 7-check scripted execution (including inverse mutation test)
|
| 708 |
+
4. **OpenEnv server** — `RangeEnvironment(Environment)` with `reset()`, `step()`, `state`, deployed on HF Spaces
|
| 709 |
+
5. **Rewards** — `Rubric` subclasses for Red and Blue, all verifiable against container state
|
| 710 |
+
6. **Client** — `OpenRangeEnv(EnvClient)` with typed parsing
|
| 711 |
+
7. **NPC traffic** — background noise for Blue
|
| 712 |
+
|
| 713 |
+
### DEFERRED (training is downstream of the environment)
|
| 714 |
+
Training scripts (GRPO, SFT, curriculum) are **out of scope for hackathon core**. The environment
|
| 715 |
+
must work first — anyone can plug in TRL/Unsloth/SkyRL later via `rollout_func`. We demonstrate
|
| 716 |
+
the environment with scripted or manual agents, not trained ones.
|
| 717 |
+
|
| 718 |
+
### Constraints
|
| 719 |
+
- **OpenEnv 0.2.x** on HF Spaces (FastAPI server with typed Pydantic models)
|
| 720 |
+
- **Infra**: HF Spaces (OpenEnv server) + Docker host (range containers)
|
| 721 |
+
- **Demo**: 1-min YouTube showing YAML → Builder generates range → Validator confirms → Red agent exploits → Blue agent defends → Builder mutates → new challenge
|
| 722 |
+
- **License**: Apache 2.0
|
README.md
CHANGED
|
@@ -1,7 +1,15 @@
|
|
| 1 |
---
|
| 2 |
-
title: OpenRange
|
|
|
|
|
|
|
|
|
|
| 3 |
sdk: docker
|
|
|
|
| 4 |
app_port: 8000
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
---
|
| 6 |
|
| 7 |
# OpenRange
|
|
|
|
| 1 |
---
|
| 2 |
+
title: OpenRange Environment Server
|
| 3 |
+
emoji: 🎯
|
| 4 |
+
colorFrom: red
|
| 5 |
+
colorTo: blue
|
| 6 |
sdk: docker
|
| 7 |
+
pinned: false
|
| 8 |
app_port: 8000
|
| 9 |
+
base_path: /web
|
| 10 |
+
tags:
|
| 11 |
+
- openenv
|
| 12 |
+
- rl-environment
|
| 13 |
---
|
| 14 |
|
| 15 |
# OpenRange
|
openenv.yaml
CHANGED
|
@@ -4,5 +4,3 @@ type: space
|
|
| 4 |
runtime: fastapi
|
| 5 |
app: server.app:app
|
| 6 |
port: 8000
|
| 7 |
-
version: 0.1.0
|
| 8 |
-
description: "Multi-agent cybersecurity gymnasium built on OpenEnv"
|
|
|
|
| 4 |
runtime: fastapi
|
| 5 |
app: server.app:app
|
| 6 |
port: 8000
|
|
|
|
|
|
pyproject.toml
CHANGED
|
@@ -1,17 +1,22 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
[project]
|
| 2 |
-
name = "open-range"
|
| 3 |
version = "0.1.0"
|
| 4 |
description = "Multi-agent cybersecurity gymnasium built on OpenEnv"
|
| 5 |
requires-python = ">=3.11"
|
| 6 |
license = "Apache-2.0"
|
| 7 |
dependencies = [
|
| 8 |
"openenv-core[core]>=0.2.1",
|
| 9 |
-
"
|
| 10 |
-
"
|
|
|
|
| 11 |
"pyyaml>=6.0",
|
| 12 |
"docker>=7.0",
|
| 13 |
"jinja2>=3.1",
|
| 14 |
-
"uvicorn>=0.
|
| 15 |
]
|
| 16 |
|
| 17 |
[project.optional-dependencies]
|
|
@@ -19,15 +24,24 @@ dev = ["pytest>=8.0", "pytest-asyncio>=0.23", "httpx>=0.27"]
|
|
| 19 |
training = ["trl>=0.8", "unsloth"]
|
| 20 |
builder = ["litellm>=1.30"]
|
| 21 |
|
| 22 |
-
[build-system]
|
| 23 |
-
requires = ["hatchling"]
|
| 24 |
-
build-backend = "hatchling.build"
|
| 25 |
-
|
| 26 |
-
[tool.hatch.build.targets.wheel]
|
| 27 |
-
packages = ["src/open_range"]
|
| 28 |
-
|
| 29 |
[project.scripts]
|
|
|
|
| 30 |
server = "open_range.server.app:main"
|
| 31 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 32 |
[tool.pytest.ini_options]
|
| 33 |
asyncio_mode = "auto"
|
|
|
|
| 1 |
+
[build-system]
|
| 2 |
+
requires = ["setuptools>=45", "wheel"]
|
| 3 |
+
build-backend = "setuptools.build_meta"
|
| 4 |
+
|
| 5 |
[project]
|
| 6 |
+
name = "openenv-open-range"
|
| 7 |
version = "0.1.0"
|
| 8 |
description = "Multi-agent cybersecurity gymnasium built on OpenEnv"
|
| 9 |
requires-python = ">=3.11"
|
| 10 |
license = "Apache-2.0"
|
| 11 |
dependencies = [
|
| 12 |
"openenv-core[core]>=0.2.1",
|
| 13 |
+
"click>=8.1",
|
| 14 |
+
"fastapi>=0.115.0",
|
| 15 |
+
"pydantic>=2.0.0",
|
| 16 |
"pyyaml>=6.0",
|
| 17 |
"docker>=7.0",
|
| 18 |
"jinja2>=3.1",
|
| 19 |
+
"uvicorn>=0.24.0",
|
| 20 |
]
|
| 21 |
|
| 22 |
[project.optional-dependencies]
|
|
|
|
| 24 |
training = ["trl>=0.8", "unsloth"]
|
| 25 |
builder = ["litellm>=1.30"]
|
| 26 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
[project.scripts]
|
| 28 |
+
openrange = "open_range.cli:cli"
|
| 29 |
server = "open_range.server.app:main"
|
| 30 |
|
| 31 |
+
[tool.setuptools]
|
| 32 |
+
include-package-data = true
|
| 33 |
+
packages = [
|
| 34 |
+
"open_range",
|
| 35 |
+
"open_range.agents",
|
| 36 |
+
"open_range.builder",
|
| 37 |
+
"open_range.builder.npc",
|
| 38 |
+
"open_range.client",
|
| 39 |
+
"open_range.server",
|
| 40 |
+
"open_range.training",
|
| 41 |
+
"open_range.validator",
|
| 42 |
+
]
|
| 43 |
+
package-dir = { "" = "src" }
|
| 44 |
+
package-data = { "open_range" = ["**/*.yaml", "**/*.yml"] }
|
| 45 |
+
|
| 46 |
[tool.pytest.ini_options]
|
| 47 |
asyncio_mode = "auto"
|
server/Dockerfile
CHANGED
|
@@ -1,23 +1,42 @@
|
|
| 1 |
-
|
|
|
|
| 2 |
|
| 3 |
WORKDIR /app
|
| 4 |
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
|
|
|
| 9 |
&& rm -rf /var/lib/apt/lists/*
|
| 10 |
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
-
|
|
|
|
| 17 |
|
| 18 |
-
|
|
|
|
| 19 |
|
| 20 |
-
HEALTHCHECK --interval=30s --timeout=5s --retries=3 \
|
| 21 |
-
CMD
|
| 22 |
|
| 23 |
-
CMD ["
|
|
|
|
| 1 |
+
ARG BASE_IMAGE=ghcr.io/meta-pytorch/openenv-base:latest
|
| 2 |
+
FROM ${BASE_IMAGE} AS builder
|
| 3 |
|
| 4 |
WORKDIR /app
|
| 5 |
|
| 6 |
+
COPY . /app/env
|
| 7 |
+
WORKDIR /app/env
|
| 8 |
+
|
| 9 |
+
# Install git for git+ dependencies
|
| 10 |
+
RUN apt-get update && apt-get install -y --no-install-recommends git \
|
| 11 |
&& rm -rf /var/lib/apt/lists/*
|
| 12 |
|
| 13 |
+
# Two-pass install for better layer caching
|
| 14 |
+
RUN --mount=type=cache,target=/root/.cache/uv \
|
| 15 |
+
if [ -f uv.lock ]; then \
|
| 16 |
+
uv sync --frozen --no-install-project --no-editable; \
|
| 17 |
+
else \
|
| 18 |
+
uv sync --no-install-project --no-editable; \
|
| 19 |
+
fi
|
| 20 |
+
|
| 21 |
+
RUN --mount=type=cache,target=/root/.cache/uv \
|
| 22 |
+
if [ -f uv.lock ]; then \
|
| 23 |
+
uv sync --frozen --no-editable; \
|
| 24 |
+
else \
|
| 25 |
+
uv sync --no-editable; \
|
| 26 |
+
fi
|
| 27 |
+
|
| 28 |
+
# Runtime stage
|
| 29 |
+
FROM ${BASE_IMAGE}
|
| 30 |
+
|
| 31 |
+
WORKDIR /app
|
| 32 |
|
| 33 |
+
COPY --from=builder /app/env/.venv /app/.venv
|
| 34 |
+
COPY --from=builder /app/env /app/env
|
| 35 |
|
| 36 |
+
ENV PATH="/app/.venv/bin:$PATH"
|
| 37 |
+
ENV PYTHONPATH="/app/env/src:/app/env:$PYTHONPATH"
|
| 38 |
|
| 39 |
+
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
|
| 40 |
+
CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')" || exit 1
|
| 41 |
|
| 42 |
+
CMD ["sh", "-c", "cd /app/env && uvicorn server.app:app --host 0.0.0.0 --port 8000"]
|
server/__init__.py
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
"""Repository-level OpenEnv server entrypoints."""
|
| 2 |
|
| 3 |
-
from .app import app,
|
| 4 |
from .environment import RangeEnvironment
|
| 5 |
|
| 6 |
-
__all__ = ["RangeEnvironment", "app", "
|
|
|
|
| 1 |
"""Repository-level OpenEnv server entrypoints."""
|
| 2 |
|
| 3 |
+
from .app import app, main
|
| 4 |
from .environment import RangeEnvironment
|
| 5 |
|
| 6 |
+
__all__ = ["RangeEnvironment", "app", "main"]
|
server/app.py
CHANGED
|
@@ -1,10 +1,16 @@
|
|
| 1 |
-
"""OpenEnv app entrypoint expected by ``openenv.yaml``.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
from __future__ import annotations
|
| 4 |
|
| 5 |
-
from open_range.server.app import
|
| 6 |
|
| 7 |
-
|
| 8 |
|
| 9 |
|
| 10 |
def main() -> None:
|
|
|
|
| 1 |
+
"""OpenEnv app entrypoint expected by ``openenv.yaml``.
|
| 2 |
+
|
| 3 |
+
Thin wrapper that delegates to the real app factory in
|
| 4 |
+
``open_range.server.app``. This file lives at the repo root
|
| 5 |
+
so the Dockerfile CMD ``cd /app/env && uvicorn server.app:app``
|
| 6 |
+
resolves correctly inside HF Spaces.
|
| 7 |
+
"""
|
| 8 |
|
| 9 |
from __future__ import annotations
|
| 10 |
|
| 11 |
+
from open_range.server.app import create_app as _create_app
|
| 12 |
|
| 13 |
+
app = _create_app()
|
| 14 |
|
| 15 |
|
| 16 |
def main() -> None:
|
src/open_range/builder/builder.py
CHANGED
|
@@ -3,6 +3,9 @@
|
|
| 3 |
- LLMSnapshotBuilder: production -- uses litellm to generate snapshot specs
|
| 4 |
- TemplateOnlyBuilder: testing -- deterministic, no LLM calls
|
| 5 |
- FileBuilder: demos -- loads a pre-built snapshot from a JSON file
|
|
|
|
|
|
|
|
|
|
| 6 |
"""
|
| 7 |
|
| 8 |
from __future__ import annotations
|
|
@@ -12,7 +15,9 @@ import logging
|
|
| 12 |
import os
|
| 13 |
import random
|
| 14 |
from pathlib import Path
|
| 15 |
-
from typing import Any
|
|
|
|
|
|
|
| 16 |
|
| 17 |
try:
|
| 18 |
import litellm
|
|
@@ -38,6 +43,106 @@ from open_range.builder.prompts import BUILDER_SYSTEM_PROMPT
|
|
| 38 |
logger = logging.getLogger(__name__)
|
| 39 |
|
| 40 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 41 |
# ---------------------------------------------------------------------------
|
| 42 |
# LLM-based builder (production)
|
| 43 |
# ---------------------------------------------------------------------------
|
|
@@ -57,7 +162,18 @@ class LLMSnapshotBuilder:
|
|
| 57 |
temperature: float = 0.7,
|
| 58 |
max_retries: int = 3,
|
| 59 |
max_tokens: int = 32768,
|
|
|
|
| 60 |
) -> None:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 61 |
self.model = model or os.environ.get(
|
| 62 |
"OPENRANGE_BUILDER_MODEL", "anthropic/claude-sonnet-4-20250514"
|
| 63 |
)
|
|
@@ -65,13 +181,18 @@ class LLMSnapshotBuilder:
|
|
| 65 |
self.temperature = temperature
|
| 66 |
self.max_retries = max_retries
|
| 67 |
self.max_tokens = max_tokens
|
|
|
|
| 68 |
|
| 69 |
async def build(
|
| 70 |
self,
|
| 71 |
manifest: dict,
|
| 72 |
context: BuildContext,
|
| 73 |
) -> SnapshotSpec:
|
| 74 |
-
"""Call LLM to generate a candidate snapshot spec.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 75 |
if litellm is None:
|
| 76 |
raise RuntimeError(
|
| 77 |
"LLMSnapshotBuilder requires the optional builder extra. "
|
|
@@ -89,23 +210,29 @@ class LLMSnapshotBuilder:
|
|
| 89 |
)
|
| 90 |
)
|
| 91 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 92 |
last_error: Exception | None = None
|
|
|
|
| 93 |
for attempt in range(1, self.max_retries + 1):
|
| 94 |
try:
|
| 95 |
messages: list[dict[str, str]] = [
|
| 96 |
{"role": "system", "content": self.prompt_template},
|
| 97 |
{"role": "user", "content": user_payload},
|
| 98 |
]
|
| 99 |
-
# If retrying after a
|
| 100 |
-
|
| 101 |
-
if error and attempt > 1:
|
| 102 |
messages.append(
|
| 103 |
{
|
| 104 |
"role": "user",
|
| 105 |
"content": (
|
| 106 |
-
"Previous attempt failed
|
| 107 |
-
f"Error: {
|
| 108 |
-
"Please fix and regenerate."
|
| 109 |
),
|
| 110 |
}
|
| 111 |
)
|
|
@@ -114,6 +241,7 @@ class LLMSnapshotBuilder:
|
|
| 114 |
"model": self.model,
|
| 115 |
"messages": messages,
|
| 116 |
"max_tokens": self.max_tokens,
|
|
|
|
| 117 |
}
|
| 118 |
# Codex models don't support temperature
|
| 119 |
if self.temperature is not None:
|
|
@@ -121,24 +249,56 @@ class LLMSnapshotBuilder:
|
|
| 121 |
# Request JSON output; some models need the word "json"
|
| 122 |
# in messages to use json_object format
|
| 123 |
kwargs["response_format"] = {"type": "json_object"}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 124 |
response = await litellm.acompletion(**kwargs)
|
| 125 |
|
| 126 |
raw = response.choices[0].message.content
|
|
|
|
|
|
|
|
|
|
|
|
|
| 127 |
spec = _parse_llm_response(raw)
|
| 128 |
logger.info(
|
| 129 |
-
"LLMSnapshotBuilder:
|
| 130 |
-
spec.topology.get("hosts", [])[:3],
|
| 131 |
attempt,
|
|
|
|
|
|
|
|
|
|
| 132 |
)
|
| 133 |
return spec
|
| 134 |
|
| 135 |
-
except
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 136 |
last_error = exc
|
|
|
|
| 137 |
logger.warning(
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 138 |
"LLMSnapshotBuilder attempt %d/%d failed: %s",
|
| 139 |
attempt,
|
| 140 |
self.max_retries,
|
| 141 |
-
|
| 142 |
)
|
| 143 |
|
| 144 |
raise RuntimeError(
|
|
@@ -147,76 +307,182 @@ class LLMSnapshotBuilder:
|
|
| 147 |
)
|
| 148 |
|
| 149 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 150 |
def _parse_llm_response(raw_json: str) -> SnapshotSpec:
|
| 151 |
"""Parse raw JSON from LLM into a validated SnapshotSpec.
|
| 152 |
|
| 153 |
-
|
| 154 |
-
|
|
|
|
| 155 |
"""
|
| 156 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 157 |
|
| 158 |
# Map truth_graph vulns
|
| 159 |
vulns = []
|
| 160 |
-
for v in
|
| 161 |
-
|
| 162 |
-
|
| 163 |
-
|
| 164 |
-
|
| 165 |
-
|
| 166 |
-
|
| 167 |
-
|
| 168 |
-
|
| 169 |
-
|
| 170 |
-
|
| 171 |
-
|
|
|
|
|
|
|
| 172 |
)
|
| 173 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 174 |
|
| 175 |
# Map exploit_chain -- LLM uses "vuln"/"action", protocol uses "vuln_id"/"command"
|
| 176 |
exploit_chain = []
|
| 177 |
-
for ec in
|
| 178 |
-
|
| 179 |
-
|
| 180 |
-
|
| 181 |
-
|
| 182 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 183 |
)
|
| 184 |
-
)
|
| 185 |
|
| 186 |
truth_graph = TruthGraph(
|
| 187 |
vulns=vulns,
|
| 188 |
exploit_chain=exploit_chain,
|
| 189 |
)
|
| 190 |
|
| 191 |
-
# Map golden_path -- LLM uses "expect_stdout", protocol uses "expect_in_stdout"
|
| 192 |
golden_path = []
|
| 193 |
-
for step in
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 194 |
golden_path.append(
|
| 195 |
GoldenPathStep(
|
| 196 |
-
step=step.
|
| 197 |
-
command=
|
| 198 |
-
expect_in_stdout=
|
| 199 |
-
|
| 200 |
-
),
|
| 201 |
-
description=step.get("description", ""),
|
| 202 |
)
|
| 203 |
)
|
| 204 |
|
| 205 |
# Map flags
|
| 206 |
-
flags = [
|
| 207 |
-
|
| 208 |
-
|
| 209 |
-
|
| 210 |
-
|
| 211 |
-
|
| 212 |
-
|
| 213 |
-
|
| 214 |
-
|
| 215 |
-
|
| 216 |
-
|
| 217 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 218 |
evidence_spec: list[EvidenceItem] = []
|
|
|
|
| 219 |
if isinstance(evidence_raw, dict):
|
|
|
|
| 220 |
for key, val in evidence_raw.items():
|
| 221 |
if isinstance(val, list):
|
| 222 |
for item in val:
|
|
@@ -234,23 +500,31 @@ def _parse_llm_response(raw_json: str) -> SnapshotSpec:
|
|
| 234 |
|
| 235 |
# Map NPC personas
|
| 236 |
npc_personas = []
|
| 237 |
-
for p in
|
| 238 |
-
|
| 239 |
-
|
| 240 |
-
|
| 241 |
-
|
| 242 |
-
|
| 243 |
-
|
| 244 |
-
|
| 245 |
-
|
| 246 |
-
|
| 247 |
-
|
| 248 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 249 |
)
|
| 250 |
-
)
|
| 251 |
|
| 252 |
# Map NPC traffic
|
| 253 |
-
npc_raw =
|
| 254 |
npc_traffic = NPCTrafficSpec(
|
| 255 |
level=0,
|
| 256 |
rate_lambda=npc_raw.get("http_rate", 10),
|
|
@@ -258,19 +532,17 @@ def _parse_llm_response(raw_json: str) -> SnapshotSpec:
|
|
| 258 |
)
|
| 259 |
|
| 260 |
# Map task
|
| 261 |
-
task_raw = data.get("task", {})
|
| 262 |
task = TaskSpec(
|
| 263 |
-
red_briefing=
|
| 264 |
-
blue_briefing=
|
| 265 |
)
|
| 266 |
|
| 267 |
# Map files -- explicit files from LLM + extract from vulnerable_code
|
| 268 |
files: dict[str, str] = {}
|
| 269 |
|
| 270 |
# 1. Explicit files field from LLM output
|
| 271 |
-
|
| 272 |
-
|
| 273 |
-
for key, content in files_raw.items():
|
| 274 |
if isinstance(content, str):
|
| 275 |
files[key] = content
|
| 276 |
|
|
@@ -289,8 +561,16 @@ def _parse_llm_response(raw_json: str) -> SnapshotSpec:
|
|
| 289 |
if container_key not in files:
|
| 290 |
files[container_key] = vc
|
| 291 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 292 |
return SnapshotSpec(
|
| 293 |
-
topology=
|
| 294 |
truth_graph=truth_graph,
|
| 295 |
golden_path=golden_path,
|
| 296 |
flags=flags,
|
|
@@ -629,6 +909,7 @@ class TemplateOnlyBuilder:
|
|
| 629 |
"""
|
| 630 |
|
| 631 |
def __init__(self, vuln_pool: list[dict[str, Any]] | None = None) -> None:
|
|
|
|
| 632 |
self.vuln_pool = vuln_pool or _DEFAULT_VULN_POOL
|
| 633 |
|
| 634 |
async def build(
|
|
@@ -765,6 +1046,12 @@ class TemplateOnlyBuilder:
|
|
| 765 |
scripts=["http_traffic.sh", "db_traffic.sh"],
|
| 766 |
)
|
| 767 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 768 |
return SnapshotSpec(
|
| 769 |
topology=topology,
|
| 770 |
truth_graph=truth_graph,
|
|
@@ -790,6 +1077,7 @@ class FileBuilder:
|
|
| 790 |
"""
|
| 791 |
|
| 792 |
def __init__(self, snapshot_dir: str = "snapshots") -> None:
|
|
|
|
| 793 |
self.snapshot_dir = Path(snapshot_dir)
|
| 794 |
|
| 795 |
async def build(
|
|
@@ -797,7 +1085,7 @@ class FileBuilder:
|
|
| 797 |
manifest: dict,
|
| 798 |
context: BuildContext,
|
| 799 |
) -> SnapshotSpec:
|
| 800 |
-
"""Load
|
| 801 |
if not self.snapshot_dir.exists():
|
| 802 |
raise FileNotFoundError(
|
| 803 |
f"Snapshot directory not found: {self.snapshot_dir}"
|
|
@@ -817,5 +1105,6 @@ class FileBuilder:
|
|
| 817 |
else:
|
| 818 |
chosen = files[0]
|
| 819 |
|
|
|
|
| 820 |
raw = json.loads(chosen.read_text())
|
| 821 |
return _parse_llm_response(json.dumps(raw))
|
|
|
|
| 3 |
- LLMSnapshotBuilder: production -- uses litellm to generate snapshot specs
|
| 4 |
- TemplateOnlyBuilder: testing -- deterministic, no LLM calls
|
| 5 |
- FileBuilder: demos -- loads a pre-built snapshot from a JSON file
|
| 6 |
+
|
| 7 |
+
Each builder implements the SnapshotBuilder protocol and returns a validated
|
| 8 |
+
SnapshotSpec that can be rendered into Docker artifacts by the SnapshotRenderer.
|
| 9 |
"""
|
| 10 |
|
| 11 |
from __future__ import annotations
|
|
|
|
| 15 |
import os
|
| 16 |
import random
|
| 17 |
from pathlib import Path
|
| 18 |
+
from typing import Any, Optional
|
| 19 |
+
|
| 20 |
+
from pydantic import BaseModel, Field
|
| 21 |
|
| 22 |
try:
|
| 23 |
import litellm
|
|
|
|
| 43 |
logger = logging.getLogger(__name__)
|
| 44 |
|
| 45 |
|
| 46 |
+
# ---------------------------------------------------------------------------
|
| 47 |
+
# LLM raw output model -- matches the LLM's JSON schema exactly
|
| 48 |
+
# ---------------------------------------------------------------------------
|
| 49 |
+
|
| 50 |
+
|
| 51 |
+
class _LLMVulnerability(BaseModel):
|
| 52 |
+
"""Raw vulnerability as returned by the LLM."""
|
| 53 |
+
|
| 54 |
+
id: str = ""
|
| 55 |
+
type: str = ""
|
| 56 |
+
host: str = ""
|
| 57 |
+
service: str = ""
|
| 58 |
+
injection_point: str = ""
|
| 59 |
+
vulnerable_code: str | dict[str, str] = ""
|
| 60 |
+
root_cause: str = ""
|
| 61 |
+
blast_radius: str = ""
|
| 62 |
+
remediation: str = ""
|
| 63 |
+
|
| 64 |
+
|
| 65 |
+
class _LLMExploitStep(BaseModel):
|
| 66 |
+
"""Raw exploit step -- LLM uses 'vuln'/'action'/'yields' field names."""
|
| 67 |
+
|
| 68 |
+
vuln: str = ""
|
| 69 |
+
vuln_id: str = ""
|
| 70 |
+
action: str = ""
|
| 71 |
+
command: str = ""
|
| 72 |
+
yields: str = ""
|
| 73 |
+
description: str = ""
|
| 74 |
+
|
| 75 |
+
|
| 76 |
+
class _LLMGoldenPathStep(BaseModel):
|
| 77 |
+
"""Raw golden path step -- LLM uses 'cmd' and 'expect_stdout'."""
|
| 78 |
+
|
| 79 |
+
step: int = 0
|
| 80 |
+
cmd: str = ""
|
| 81 |
+
command: str = ""
|
| 82 |
+
expect_stdout: str = ""
|
| 83 |
+
expect_in_stdout: str = ""
|
| 84 |
+
description: str = ""
|
| 85 |
+
host: str = "attacker"
|
| 86 |
+
|
| 87 |
+
|
| 88 |
+
class _LLMFlag(BaseModel):
|
| 89 |
+
"""Raw flag definition from LLM output."""
|
| 90 |
+
|
| 91 |
+
id: str = ""
|
| 92 |
+
value: str = ""
|
| 93 |
+
path: str = ""
|
| 94 |
+
host: str = ""
|
| 95 |
+
|
| 96 |
+
|
| 97 |
+
class _LLMNPCPersona(BaseModel):
|
| 98 |
+
"""Raw NPC persona from LLM output."""
|
| 99 |
+
|
| 100 |
+
name: str = ""
|
| 101 |
+
role: str = ""
|
| 102 |
+
department: str = ""
|
| 103 |
+
reports_to: str = ""
|
| 104 |
+
communication_style: str = ""
|
| 105 |
+
security_awareness: float = 0.5
|
| 106 |
+
susceptibility: dict[str, Any] = Field(default_factory=dict)
|
| 107 |
+
routine: dict[str, Any] = Field(default_factory=dict)
|
| 108 |
+
accounts: dict[str, Any] = Field(default_factory=dict)
|
| 109 |
+
|
| 110 |
+
|
| 111 |
+
class _LLMTruthGraph(BaseModel):
|
| 112 |
+
"""Raw truth graph from LLM output."""
|
| 113 |
+
|
| 114 |
+
vulns: list[_LLMVulnerability] = Field(default_factory=list)
|
| 115 |
+
exploit_chain: list[_LLMExploitStep] = Field(default_factory=list)
|
| 116 |
+
|
| 117 |
+
|
| 118 |
+
class _LLMTask(BaseModel):
|
| 119 |
+
"""Raw task specification from LLM output."""
|
| 120 |
+
|
| 121 |
+
red_briefing: str = ""
|
| 122 |
+
blue_briefing: str = ""
|
| 123 |
+
|
| 124 |
+
|
| 125 |
+
class LLMSnapshotOutput(BaseModel):
|
| 126 |
+
"""Intermediate model matching the LLM's raw JSON schema.
|
| 127 |
+
|
| 128 |
+
This captures the exact field names the LLM produces, including
|
| 129 |
+
known mismatches like 'vuln' vs 'vuln_id', 'cmd' vs 'command',
|
| 130 |
+
and 'expect_stdout' vs 'expect_in_stdout'. Parsing into this model
|
| 131 |
+
first makes schema mismatches explicit and testable before mapping
|
| 132 |
+
to the canonical SnapshotSpec.
|
| 133 |
+
"""
|
| 134 |
+
|
| 135 |
+
topology: dict[str, Any] = Field(default_factory=dict)
|
| 136 |
+
truth_graph: _LLMTruthGraph = Field(default_factory=_LLMTruthGraph)
|
| 137 |
+
golden_path: list[_LLMGoldenPathStep] = Field(default_factory=list)
|
| 138 |
+
flags: list[_LLMFlag] = Field(default_factory=list)
|
| 139 |
+
evidence_spec: dict[str, Any] | list[dict[str, Any]] = Field(default_factory=dict)
|
| 140 |
+
npc_personas: list[_LLMNPCPersona] = Field(default_factory=list)
|
| 141 |
+
npc_traffic: dict[str, Any] = Field(default_factory=dict)
|
| 142 |
+
task: _LLMTask = Field(default_factory=_LLMTask)
|
| 143 |
+
files: dict[str, str] = Field(default_factory=dict)
|
| 144 |
+
|
| 145 |
+
|
| 146 |
# ---------------------------------------------------------------------------
|
| 147 |
# LLM-based builder (production)
|
| 148 |
# ---------------------------------------------------------------------------
|
|
|
|
| 162 |
temperature: float = 0.7,
|
| 163 |
max_retries: int = 3,
|
| 164 |
max_tokens: int = 32768,
|
| 165 |
+
timeout: float = 120.0,
|
| 166 |
) -> None:
|
| 167 |
+
"""Initialize the LLM-based snapshot builder.
|
| 168 |
+
|
| 169 |
+
Args:
|
| 170 |
+
model: LiteLLM model identifier (e.g. 'azure/gpt-5.2').
|
| 171 |
+
prompt_template: System prompt override.
|
| 172 |
+
temperature: Sampling temperature for LLM calls.
|
| 173 |
+
max_retries: Maximum number of LLM call + parse attempts.
|
| 174 |
+
max_tokens: Maximum tokens in LLM response.
|
| 175 |
+
timeout: Timeout in seconds for each LLM call.
|
| 176 |
+
"""
|
| 177 |
self.model = model or os.environ.get(
|
| 178 |
"OPENRANGE_BUILDER_MODEL", "anthropic/claude-sonnet-4-20250514"
|
| 179 |
)
|
|
|
|
| 181 |
self.temperature = temperature
|
| 182 |
self.max_retries = max_retries
|
| 183 |
self.max_tokens = max_tokens
|
| 184 |
+
self.timeout = timeout
|
| 185 |
|
| 186 |
async def build(
|
| 187 |
self,
|
| 188 |
manifest: dict,
|
| 189 |
context: BuildContext,
|
| 190 |
) -> SnapshotSpec:
|
| 191 |
+
"""Call LLM to generate a candidate snapshot spec.
|
| 192 |
+
|
| 193 |
+
Retries on LLM or parse failures, appending error context to each
|
| 194 |
+
subsequent attempt so the LLM can self-correct.
|
| 195 |
+
"""
|
| 196 |
if litellm is None:
|
| 197 |
raise RuntimeError(
|
| 198 |
"LLMSnapshotBuilder requires the optional builder extra. "
|
|
|
|
| 210 |
)
|
| 211 |
)
|
| 212 |
|
| 213 |
+
logger.info(
|
| 214 |
+
"LLMSnapshotBuilder: starting build (model=%s, tier=%d)",
|
| 215 |
+
self.model,
|
| 216 |
+
context.tier,
|
| 217 |
+
)
|
| 218 |
+
|
| 219 |
last_error: Exception | None = None
|
| 220 |
+
last_error_msg: str = ""
|
| 221 |
for attempt in range(1, self.max_retries + 1):
|
| 222 |
try:
|
| 223 |
messages: list[dict[str, str]] = [
|
| 224 |
{"role": "system", "content": self.prompt_template},
|
| 225 |
{"role": "user", "content": user_payload},
|
| 226 |
]
|
| 227 |
+
# If retrying after a failure, append error context so LLM can fix
|
| 228 |
+
if attempt > 1 and last_error_msg:
|
|
|
|
| 229 |
messages.append(
|
| 230 |
{
|
| 231 |
"role": "user",
|
| 232 |
"content": (
|
| 233 |
+
"Previous attempt failed. "
|
| 234 |
+
f"Error: {last_error_msg}\n"
|
| 235 |
+
"Please fix and regenerate the complete JSON."
|
| 236 |
),
|
| 237 |
}
|
| 238 |
)
|
|
|
|
| 241 |
"model": self.model,
|
| 242 |
"messages": messages,
|
| 243 |
"max_tokens": self.max_tokens,
|
| 244 |
+
"timeout": self.timeout,
|
| 245 |
}
|
| 246 |
# Codex models don't support temperature
|
| 247 |
if self.temperature is not None:
|
|
|
|
| 249 |
# Request JSON output; some models need the word "json"
|
| 250 |
# in messages to use json_object format
|
| 251 |
kwargs["response_format"] = {"type": "json_object"}
|
| 252 |
+
|
| 253 |
+
logger.debug(
|
| 254 |
+
"LLMSnapshotBuilder: sending request (attempt %d/%d, timeout=%.0fs)",
|
| 255 |
+
attempt,
|
| 256 |
+
self.max_retries,
|
| 257 |
+
self.timeout,
|
| 258 |
+
)
|
| 259 |
response = await litellm.acompletion(**kwargs)
|
| 260 |
|
| 261 |
raw = response.choices[0].message.content
|
| 262 |
+
logger.debug(
|
| 263 |
+
"LLMSnapshotBuilder: received response (%d chars)",
|
| 264 |
+
len(raw) if raw else 0,
|
| 265 |
+
)
|
| 266 |
spec = _parse_llm_response(raw)
|
| 267 |
logger.info(
|
| 268 |
+
"LLMSnapshotBuilder: build completed (attempt %d/%d, %d vulns, %d golden path steps)",
|
|
|
|
| 269 |
attempt,
|
| 270 |
+
self.max_retries,
|
| 271 |
+
len(spec.truth_graph.vulns),
|
| 272 |
+
len(spec.golden_path),
|
| 273 |
)
|
| 274 |
return spec
|
| 275 |
|
| 276 |
+
except json.JSONDecodeError as exc:
|
| 277 |
+
last_error = exc
|
| 278 |
+
last_error_msg = f"JSON parse error at position {exc.pos}: {exc.msg}"
|
| 279 |
+
logger.warning(
|
| 280 |
+
"LLMSnapshotBuilder attempt %d/%d: JSON parse failed: %s",
|
| 281 |
+
attempt,
|
| 282 |
+
self.max_retries,
|
| 283 |
+
last_error_msg,
|
| 284 |
+
)
|
| 285 |
+
except SnapshotParseError as exc:
|
| 286 |
last_error = exc
|
| 287 |
+
last_error_msg = str(exc)
|
| 288 |
logger.warning(
|
| 289 |
+
"LLMSnapshotBuilder attempt %d/%d: snapshot parse failed: %s",
|
| 290 |
+
attempt,
|
| 291 |
+
self.max_retries,
|
| 292 |
+
last_error_msg,
|
| 293 |
+
)
|
| 294 |
+
except Exception as exc:
|
| 295 |
+
last_error = exc
|
| 296 |
+
last_error_msg = f"{type(exc).__name__}: {exc}"
|
| 297 |
+
logger.error(
|
| 298 |
"LLMSnapshotBuilder attempt %d/%d failed: %s",
|
| 299 |
attempt,
|
| 300 |
self.max_retries,
|
| 301 |
+
last_error_msg,
|
| 302 |
)
|
| 303 |
|
| 304 |
raise RuntimeError(
|
|
|
|
| 307 |
)
|
| 308 |
|
| 309 |
|
| 310 |
+
# ---------------------------------------------------------------------------
|
| 311 |
+
# Parse error with context
|
| 312 |
+
# ---------------------------------------------------------------------------
|
| 313 |
+
|
| 314 |
+
|
| 315 |
+
class SnapshotParseError(Exception):
|
| 316 |
+
"""Raised when LLM output cannot be parsed into a valid SnapshotSpec.
|
| 317 |
+
|
| 318 |
+
Includes the field that failed, received value, expected format,
|
| 319 |
+
and a truncated snippet of the raw JSON for debugging.
|
| 320 |
+
"""
|
| 321 |
+
|
| 322 |
+
def __init__(
|
| 323 |
+
self,
|
| 324 |
+
message: str,
|
| 325 |
+
field: str = "",
|
| 326 |
+
received: Any = None,
|
| 327 |
+
expected: str = "",
|
| 328 |
+
raw_json_snippet: str = "",
|
| 329 |
+
) -> None:
|
| 330 |
+
self.field = field
|
| 331 |
+
self.received = received
|
| 332 |
+
self.expected = expected
|
| 333 |
+
self.raw_json_snippet = raw_json_snippet
|
| 334 |
+
parts = [message]
|
| 335 |
+
if field:
|
| 336 |
+
parts.append(f"field={field!r}")
|
| 337 |
+
if received is not None:
|
| 338 |
+
recv_str = repr(received)
|
| 339 |
+
if len(recv_str) > 200:
|
| 340 |
+
recv_str = recv_str[:200] + "..."
|
| 341 |
+
parts.append(f"received={recv_str}")
|
| 342 |
+
if expected:
|
| 343 |
+
parts.append(f"expected={expected}")
|
| 344 |
+
if raw_json_snippet:
|
| 345 |
+
parts.append(f"raw_json_start={raw_json_snippet!r}")
|
| 346 |
+
super().__init__(" | ".join(parts))
|
| 347 |
+
|
| 348 |
+
|
| 349 |
+
# ---------------------------------------------------------------------------
|
| 350 |
+
# LLM response parser
|
| 351 |
+
# ---------------------------------------------------------------------------
|
| 352 |
+
|
| 353 |
+
|
| 354 |
def _parse_llm_response(raw_json: str) -> SnapshotSpec:
|
| 355 |
"""Parse raw JSON from LLM into a validated SnapshotSpec.
|
| 356 |
|
| 357 |
+
First parses into LLMSnapshotOutput (which matches the LLM's field names),
|
| 358 |
+
then maps to the canonical SnapshotSpec models. Handles known field-name
|
| 359 |
+
mismatches between the LLM prompt schema and Pydantic models.
|
| 360 |
"""
|
| 361 |
+
raw_snippet = raw_json[:500] if raw_json else ""
|
| 362 |
+
|
| 363 |
+
try:
|
| 364 |
+
data = json.loads(raw_json)
|
| 365 |
+
except json.JSONDecodeError:
|
| 366 |
+
raise
|
| 367 |
+
|
| 368 |
+
logger.debug("_parse_llm_response: parsing %d-char JSON response", len(raw_json))
|
| 369 |
+
|
| 370 |
+
# Parse into intermediate model first for early validation
|
| 371 |
+
try:
|
| 372 |
+
llm_output = LLMSnapshotOutput.model_validate(data)
|
| 373 |
+
except Exception as exc:
|
| 374 |
+
raise SnapshotParseError(
|
| 375 |
+
"Failed to parse LLM output into LLMSnapshotOutput",
|
| 376 |
+
field="root",
|
| 377 |
+
received=type(exc).__name__,
|
| 378 |
+
expected="valid LLMSnapshotOutput JSON",
|
| 379 |
+
raw_json_snippet=raw_snippet,
|
| 380 |
+
) from exc
|
| 381 |
|
| 382 |
# Map truth_graph vulns
|
| 383 |
vulns = []
|
| 384 |
+
for i, v in enumerate(llm_output.truth_graph.vulns):
|
| 385 |
+
try:
|
| 386 |
+
vulns.append(
|
| 387 |
+
Vulnerability(
|
| 388 |
+
id=v.id,
|
| 389 |
+
type=v.type,
|
| 390 |
+
host=v.host,
|
| 391 |
+
service=v.service,
|
| 392 |
+
injection_point=v.injection_point,
|
| 393 |
+
vulnerable_code=v.vulnerable_code,
|
| 394 |
+
root_cause=v.root_cause,
|
| 395 |
+
blast_radius=v.blast_radius,
|
| 396 |
+
remediation=v.remediation,
|
| 397 |
+
)
|
| 398 |
)
|
| 399 |
+
except Exception as exc:
|
| 400 |
+
raise SnapshotParseError(
|
| 401 |
+
f"Failed to map vulnerability at index {i}",
|
| 402 |
+
field=f"truth_graph.vulns[{i}]",
|
| 403 |
+
received=v.model_dump(),
|
| 404 |
+
expected="valid Vulnerability fields",
|
| 405 |
+
raw_json_snippet=raw_snippet,
|
| 406 |
+
) from exc
|
| 407 |
|
| 408 |
# Map exploit_chain -- LLM uses "vuln"/"action", protocol uses "vuln_id"/"command"
|
| 409 |
exploit_chain = []
|
| 410 |
+
for i, ec in enumerate(llm_output.truth_graph.exploit_chain):
|
| 411 |
+
vuln_id = ec.vuln_id or ec.vuln
|
| 412 |
+
command = ec.command or ec.action
|
| 413 |
+
description = ec.description or ec.yields
|
| 414 |
+
if vuln_id or command:
|
| 415 |
+
used_fallback = (not ec.vuln_id and ec.vuln) or (not ec.command and ec.action)
|
| 416 |
+
if used_fallback:
|
| 417 |
+
logger.warning(
|
| 418 |
+
"exploit_chain[%d]: used fallback field names (vuln=%r -> vuln_id, action=%r -> command)",
|
| 419 |
+
i,
|
| 420 |
+
ec.vuln,
|
| 421 |
+
ec.action,
|
| 422 |
+
)
|
| 423 |
+
exploit_chain.append(
|
| 424 |
+
ExploitStep(
|
| 425 |
+
vuln_id=vuln_id,
|
| 426 |
+
command=command,
|
| 427 |
+
description=description,
|
| 428 |
+
)
|
| 429 |
)
|
|
|
|
| 430 |
|
| 431 |
truth_graph = TruthGraph(
|
| 432 |
vulns=vulns,
|
| 433 |
exploit_chain=exploit_chain,
|
| 434 |
)
|
| 435 |
|
| 436 |
+
# Map golden_path -- LLM uses "cmd"/"expect_stdout", protocol uses "command"/"expect_in_stdout"
|
| 437 |
golden_path = []
|
| 438 |
+
for i, step in enumerate(llm_output.golden_path):
|
| 439 |
+
command = step.command or step.cmd
|
| 440 |
+
expect = step.expect_in_stdout or step.expect_stdout
|
| 441 |
+
if not command and step.cmd:
|
| 442 |
+
logger.warning(
|
| 443 |
+
"golden_path[%d]: used 'cmd' fallback for 'command'",
|
| 444 |
+
i,
|
| 445 |
+
)
|
| 446 |
+
if not step.expect_in_stdout and step.expect_stdout:
|
| 447 |
+
logger.warning(
|
| 448 |
+
"golden_path[%d]: used 'expect_stdout' fallback for 'expect_in_stdout'",
|
| 449 |
+
i,
|
| 450 |
+
)
|
| 451 |
golden_path.append(
|
| 452 |
GoldenPathStep(
|
| 453 |
+
step=step.step,
|
| 454 |
+
command=command,
|
| 455 |
+
expect_in_stdout=expect,
|
| 456 |
+
description=step.description,
|
|
|
|
|
|
|
| 457 |
)
|
| 458 |
)
|
| 459 |
|
| 460 |
# Map flags
|
| 461 |
+
flags = []
|
| 462 |
+
for i, f in enumerate(llm_output.flags):
|
| 463 |
+
try:
|
| 464 |
+
flags.append(
|
| 465 |
+
FlagSpec(
|
| 466 |
+
id=f.id,
|
| 467 |
+
value=f.value,
|
| 468 |
+
path=f.path,
|
| 469 |
+
host=f.host,
|
| 470 |
+
)
|
| 471 |
+
)
|
| 472 |
+
except Exception as exc:
|
| 473 |
+
raise SnapshotParseError(
|
| 474 |
+
f"Failed to map flag at index {i}",
|
| 475 |
+
field=f"flags[{i}]",
|
| 476 |
+
received=f.model_dump(),
|
| 477 |
+
expected="valid FlagSpec (id, value, path, host)",
|
| 478 |
+
raw_json_snippet=raw_snippet,
|
| 479 |
+
) from exc
|
| 480 |
+
|
| 481 |
+
# Map evidence_spec -- LLM returns dict or list, protocol expects list[EvidenceItem]
|
| 482 |
evidence_spec: list[EvidenceItem] = []
|
| 483 |
+
evidence_raw = llm_output.evidence_spec
|
| 484 |
if isinstance(evidence_raw, dict):
|
| 485 |
+
logger.debug("evidence_spec: converting dict format to list[EvidenceItem]")
|
| 486 |
for key, val in evidence_raw.items():
|
| 487 |
if isinstance(val, list):
|
| 488 |
for item in val:
|
|
|
|
| 500 |
|
| 501 |
# Map NPC personas
|
| 502 |
npc_personas = []
|
| 503 |
+
for i, p in enumerate(llm_output.npc_personas):
|
| 504 |
+
try:
|
| 505 |
+
npc_personas.append(
|
| 506 |
+
NPCPersona(
|
| 507 |
+
name=p.name,
|
| 508 |
+
role=p.role,
|
| 509 |
+
department=p.department,
|
| 510 |
+
reports_to=p.reports_to,
|
| 511 |
+
communication_style=p.communication_style,
|
| 512 |
+
security_awareness=p.security_awareness,
|
| 513 |
+
susceptibility=p.susceptibility,
|
| 514 |
+
routine=p.routine,
|
| 515 |
+
accounts=p.accounts,
|
| 516 |
+
)
|
| 517 |
+
)
|
| 518 |
+
except Exception as exc:
|
| 519 |
+
logger.warning(
|
| 520 |
+
"npc_personas[%d]: failed to map persona %r: %s",
|
| 521 |
+
i,
|
| 522 |
+
p.name,
|
| 523 |
+
exc,
|
| 524 |
)
|
|
|
|
| 525 |
|
| 526 |
# Map NPC traffic
|
| 527 |
+
npc_raw = llm_output.npc_traffic
|
| 528 |
npc_traffic = NPCTrafficSpec(
|
| 529 |
level=0,
|
| 530 |
rate_lambda=npc_raw.get("http_rate", 10),
|
|
|
|
| 532 |
)
|
| 533 |
|
| 534 |
# Map task
|
|
|
|
| 535 |
task = TaskSpec(
|
| 536 |
+
red_briefing=llm_output.task.red_briefing,
|
| 537 |
+
blue_briefing=llm_output.task.blue_briefing,
|
| 538 |
)
|
| 539 |
|
| 540 |
# Map files -- explicit files from LLM + extract from vulnerable_code
|
| 541 |
files: dict[str, str] = {}
|
| 542 |
|
| 543 |
# 1. Explicit files field from LLM output
|
| 544 |
+
if isinstance(llm_output.files, dict):
|
| 545 |
+
for key, content in llm_output.files.items():
|
|
|
|
| 546 |
if isinstance(content, str):
|
| 547 |
files[key] = content
|
| 548 |
|
|
|
|
| 561 |
if container_key not in files:
|
| 562 |
files[container_key] = vc
|
| 563 |
|
| 564 |
+
logger.debug(
|
| 565 |
+
"_parse_llm_response: mapped %d vulns, %d golden path steps, %d flags, %d files",
|
| 566 |
+
len(vulns),
|
| 567 |
+
len(golden_path),
|
| 568 |
+
len(flags),
|
| 569 |
+
len(files),
|
| 570 |
+
)
|
| 571 |
+
|
| 572 |
return SnapshotSpec(
|
| 573 |
+
topology=llm_output.topology,
|
| 574 |
truth_graph=truth_graph,
|
| 575 |
golden_path=golden_path,
|
| 576 |
flags=flags,
|
|
|
|
| 909 |
"""
|
| 910 |
|
| 911 |
def __init__(self, vuln_pool: list[dict[str, Any]] | None = None) -> None:
|
| 912 |
+
"""Initialize with an optional custom vulnerability pool."""
|
| 913 |
self.vuln_pool = vuln_pool or _DEFAULT_VULN_POOL
|
| 914 |
|
| 915 |
async def build(
|
|
|
|
| 1046 |
scripts=["http_traffic.sh", "db_traffic.sh"],
|
| 1047 |
)
|
| 1048 |
|
| 1049 |
+
logger.info(
|
| 1050 |
+
"TemplateOnlyBuilder: built snapshot with %d vulns (seed=%s)",
|
| 1051 |
+
len(vulns),
|
| 1052 |
+
context.seed,
|
| 1053 |
+
)
|
| 1054 |
+
|
| 1055 |
return SnapshotSpec(
|
| 1056 |
topology=topology,
|
| 1057 |
truth_graph=truth_graph,
|
|
|
|
| 1077 |
"""
|
| 1078 |
|
| 1079 |
def __init__(self, snapshot_dir: str = "snapshots") -> None:
|
| 1080 |
+
"""Initialize with the directory containing snapshot JSON files."""
|
| 1081 |
self.snapshot_dir = Path(snapshot_dir)
|
| 1082 |
|
| 1083 |
async def build(
|
|
|
|
| 1085 |
manifest: dict,
|
| 1086 |
context: BuildContext,
|
| 1087 |
) -> SnapshotSpec:
|
| 1088 |
+
"""Load a snapshot JSON file, optionally picking by seed."""
|
| 1089 |
if not self.snapshot_dir.exists():
|
| 1090 |
raise FileNotFoundError(
|
| 1091 |
f"Snapshot directory not found: {self.snapshot_dir}"
|
|
|
|
| 1105 |
else:
|
| 1106 |
chosen = files[0]
|
| 1107 |
|
| 1108 |
+
logger.info("FileBuilder: loading snapshot from %s", chosen)
|
| 1109 |
raw = json.loads(chosen.read_text())
|
| 1110 |
return _parse_llm_response(json.dumps(raw))
|
src/open_range/cli.py
ADDED
|
@@ -0,0 +1,438 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""OpenRange CLI -- production command-line interface for the cybersecurity gymnasium.
|
| 2 |
+
|
| 3 |
+
Usage::
|
| 4 |
+
|
| 5 |
+
openrange build -m manifests/tier1_basic.yaml
|
| 6 |
+
openrange render -s snapshots/spec.json -o output/
|
| 7 |
+
openrange validate -s snapshots/spec.json
|
| 8 |
+
openrange deploy -s snapshots/spec.json
|
| 9 |
+
openrange server --port 8000
|
| 10 |
+
"""
|
| 11 |
+
|
| 12 |
+
from __future__ import annotations
|
| 13 |
+
|
| 14 |
+
import asyncio
|
| 15 |
+
import json
|
| 16 |
+
import logging
|
| 17 |
+
import os
|
| 18 |
+
import sys
|
| 19 |
+
import time
|
| 20 |
+
from pathlib import Path
|
| 21 |
+
from typing import Any
|
| 22 |
+
|
| 23 |
+
import click
|
| 24 |
+
import yaml
|
| 25 |
+
|
| 26 |
+
# ---------------------------------------------------------------------------
|
| 27 |
+
# Logging setup
|
| 28 |
+
# ---------------------------------------------------------------------------
|
| 29 |
+
|
| 30 |
+
LOG_FORMAT = "%(asctime)s [%(levelname)s] %(name)s: %(message)s"
|
| 31 |
+
LOG_DATE_FORMAT = "%H:%M:%S"
|
| 32 |
+
|
| 33 |
+
|
| 34 |
+
def _configure_logging(verbose: bool) -> None:
|
| 35 |
+
level = logging.DEBUG if verbose else logging.INFO
|
| 36 |
+
logging.basicConfig(
|
| 37 |
+
level=level,
|
| 38 |
+
format=LOG_FORMAT,
|
| 39 |
+
datefmt=LOG_DATE_FORMAT,
|
| 40 |
+
stream=sys.stderr,
|
| 41 |
+
)
|
| 42 |
+
# Quiet noisy third-party loggers unless in verbose mode
|
| 43 |
+
if not verbose:
|
| 44 |
+
for name in ("httpx", "httpcore", "litellm", "urllib3", "docker"):
|
| 45 |
+
logging.getLogger(name).setLevel(logging.WARNING)
|
| 46 |
+
|
| 47 |
+
|
| 48 |
+
# ---------------------------------------------------------------------------
|
| 49 |
+
# Helpers
|
| 50 |
+
# ---------------------------------------------------------------------------
|
| 51 |
+
|
| 52 |
+
|
| 53 |
+
def _run_async(coro: Any) -> Any:
|
| 54 |
+
"""Run an async coroutine from synchronous Click context."""
|
| 55 |
+
try:
|
| 56 |
+
loop = asyncio.get_running_loop()
|
| 57 |
+
except RuntimeError:
|
| 58 |
+
loop = None
|
| 59 |
+
|
| 60 |
+
if loop and loop.is_running():
|
| 61 |
+
# Shouldn't happen in a CLI, but be safe.
|
| 62 |
+
import concurrent.futures
|
| 63 |
+
|
| 64 |
+
with concurrent.futures.ThreadPoolExecutor() as pool:
|
| 65 |
+
return pool.submit(asyncio.run, coro).result()
|
| 66 |
+
return asyncio.run(coro)
|
| 67 |
+
|
| 68 |
+
|
| 69 |
+
def _load_manifest(path: str) -> dict[str, Any]:
|
| 70 |
+
"""Load and return a YAML manifest as a dict."""
|
| 71 |
+
p = Path(path)
|
| 72 |
+
if not p.exists():
|
| 73 |
+
click.echo(f"Error: manifest not found: {p}", err=True)
|
| 74 |
+
sys.exit(1)
|
| 75 |
+
with open(p) as f:
|
| 76 |
+
data = yaml.safe_load(f)
|
| 77 |
+
if not isinstance(data, dict):
|
| 78 |
+
click.echo(f"Error: manifest must be a YAML mapping, got {type(data).__name__}", err=True)
|
| 79 |
+
sys.exit(1)
|
| 80 |
+
return data
|
| 81 |
+
|
| 82 |
+
|
| 83 |
+
def _load_snapshot(path: str) -> "SnapshotSpec":
|
| 84 |
+
"""Load a snapshot JSON file into a SnapshotSpec."""
|
| 85 |
+
from open_range.protocols import SnapshotSpec
|
| 86 |
+
|
| 87 |
+
p = Path(path)
|
| 88 |
+
if not p.exists():
|
| 89 |
+
click.echo(f"Error: snapshot not found: {p}", err=True)
|
| 90 |
+
sys.exit(1)
|
| 91 |
+
with open(p) as f:
|
| 92 |
+
data = json.load(f)
|
| 93 |
+
try:
|
| 94 |
+
return SnapshotSpec.model_validate(data)
|
| 95 |
+
except Exception as exc:
|
| 96 |
+
click.echo(f"Error: invalid snapshot JSON: {exc}", err=True)
|
| 97 |
+
sys.exit(1)
|
| 98 |
+
|
| 99 |
+
|
| 100 |
+
def _write_snapshot(spec: "SnapshotSpec", output_dir: Path) -> Path:
|
| 101 |
+
"""Write a SnapshotSpec to spec.json inside output_dir. Returns the file path."""
|
| 102 |
+
output_dir.mkdir(parents=True, exist_ok=True)
|
| 103 |
+
dest = output_dir / "spec.json"
|
| 104 |
+
dest.write_text(json.dumps(spec.model_dump(), indent=2, default=str))
|
| 105 |
+
return dest
|
| 106 |
+
|
| 107 |
+
|
| 108 |
+
# ---------------------------------------------------------------------------
|
| 109 |
+
# CLI group
|
| 110 |
+
# ---------------------------------------------------------------------------
|
| 111 |
+
|
| 112 |
+
|
| 113 |
+
@click.group()
|
| 114 |
+
@click.option("-v", "--verbose", is_flag=True, default=False, help="Enable debug logging.")
|
| 115 |
+
@click.version_option(package_name="openenv-open-range", prog_name="openrange")
|
| 116 |
+
def cli(verbose: bool) -> None:
|
| 117 |
+
"""OpenRange -- multi-agent cybersecurity gymnasium.
|
| 118 |
+
|
| 119 |
+
Generate, validate, deploy, and serve Docker-based cyber ranges
|
| 120 |
+
for adversarial Red/Blue agent training.
|
| 121 |
+
"""
|
| 122 |
+
_configure_logging(verbose)
|
| 123 |
+
|
| 124 |
+
|
| 125 |
+
# ---------------------------------------------------------------------------
|
| 126 |
+
# build
|
| 127 |
+
# ---------------------------------------------------------------------------
|
| 128 |
+
|
| 129 |
+
|
| 130 |
+
@cli.command()
|
| 131 |
+
@click.option("-m", "--manifest", required=True, type=click.Path(exists=True), help="Path to manifest YAML.")
|
| 132 |
+
@click.option("-o", "--output", default="./snapshots", type=click.Path(), help="Output directory for snapshot.")
|
| 133 |
+
@click.option("--model", default=None, help="LLM model (default: $OPENRANGE_BUILDER_MODEL or azure/gpt-5.2).")
|
| 134 |
+
@click.option("--tier", default=1, type=click.IntRange(1, 5), help="Tier level 1-5.")
|
| 135 |
+
@click.option("--seed", default=None, type=int, help="Random seed for reproducibility.")
|
| 136 |
+
@click.option("--template-only", is_flag=True, default=False, help="Skip LLM, use deterministic template builder.")
|
| 137 |
+
@click.option("--max-tokens", default=16384, type=int, help="Max tokens for LLM generation.")
|
| 138 |
+
def build(
|
| 139 |
+
manifest: str,
|
| 140 |
+
output: str,
|
| 141 |
+
model: str | None,
|
| 142 |
+
tier: int,
|
| 143 |
+
seed: int | None,
|
| 144 |
+
template_only: bool,
|
| 145 |
+
max_tokens: int,
|
| 146 |
+
) -> None:
|
| 147 |
+
"""Generate a snapshot from a manifest YAML.
|
| 148 |
+
|
| 149 |
+
Uses the LLM builder by default. Pass --template-only for a deterministic
|
| 150 |
+
snapshot without any LLM calls (useful for testing).
|
| 151 |
+
"""
|
| 152 |
+
from open_range.builder.builder import LLMSnapshotBuilder, TemplateOnlyBuilder
|
| 153 |
+
from open_range.protocols import BuildContext
|
| 154 |
+
|
| 155 |
+
manifest_data = _load_manifest(manifest)
|
| 156 |
+
context = BuildContext(seed=seed, tier=tier)
|
| 157 |
+
|
| 158 |
+
if template_only:
|
| 159 |
+
builder = TemplateOnlyBuilder()
|
| 160 |
+
click.echo(f"Building snapshot (template-only, tier {tier}) ...")
|
| 161 |
+
else:
|
| 162 |
+
resolved_model = model or os.environ.get("OPENRANGE_BUILDER_MODEL", "azure/gpt-5.2")
|
| 163 |
+
builder = LLMSnapshotBuilder(model=resolved_model, max_tokens=max_tokens)
|
| 164 |
+
click.echo(f"Building snapshot (model={resolved_model}, tier {tier}) ...")
|
| 165 |
+
|
| 166 |
+
t0 = time.monotonic()
|
| 167 |
+
try:
|
| 168 |
+
spec = _run_async(builder.build(manifest_data, context))
|
| 169 |
+
except Exception as exc:
|
| 170 |
+
click.echo(f"Error: build failed: {exc}", err=True)
|
| 171 |
+
sys.exit(1)
|
| 172 |
+
elapsed = time.monotonic() - t0
|
| 173 |
+
|
| 174 |
+
output_path = Path(output)
|
| 175 |
+
dest = _write_snapshot(spec, output_path)
|
| 176 |
+
|
| 177 |
+
n_vulns = len(spec.truth_graph.vulns)
|
| 178 |
+
n_steps = len(spec.golden_path)
|
| 179 |
+
n_flags = len(spec.flags)
|
| 180 |
+
|
| 181 |
+
click.echo(f"Snapshot written to {dest}")
|
| 182 |
+
click.echo(f" Vulnerabilities: {n_vulns}")
|
| 183 |
+
click.echo(f" Golden path steps: {n_steps}")
|
| 184 |
+
click.echo(f" Flags: {n_flags}")
|
| 185 |
+
click.echo(f" Elapsed: {elapsed:.1f}s")
|
| 186 |
+
|
| 187 |
+
|
| 188 |
+
# ---------------------------------------------------------------------------
|
| 189 |
+
# render
|
| 190 |
+
# ---------------------------------------------------------------------------
|
| 191 |
+
|
| 192 |
+
|
| 193 |
+
@cli.command()
|
| 194 |
+
@click.option("-s", "--snapshot", required=True, type=click.Path(exists=True), help="Path to snapshot JSON.")
|
| 195 |
+
@click.option("-o", "--output", required=True, type=click.Path(), help="Output directory for Docker artifacts.")
|
| 196 |
+
def render(snapshot: str, output: str) -> None:
|
| 197 |
+
"""Render a snapshot JSON into Docker artifacts (Dockerfiles, compose, configs)."""
|
| 198 |
+
from open_range.builder.renderer import SnapshotRenderer
|
| 199 |
+
|
| 200 |
+
spec = _load_snapshot(snapshot)
|
| 201 |
+
renderer = SnapshotRenderer()
|
| 202 |
+
output_path = Path(output)
|
| 203 |
+
|
| 204 |
+
click.echo(f"Rendering snapshot to {output_path} ...")
|
| 205 |
+
try:
|
| 206 |
+
renderer.render(spec, output_path)
|
| 207 |
+
except Exception as exc:
|
| 208 |
+
click.echo(f"Error: render failed: {exc}", err=True)
|
| 209 |
+
sys.exit(1)
|
| 210 |
+
|
| 211 |
+
# List produced files
|
| 212 |
+
if output_path.exists():
|
| 213 |
+
artifacts = sorted(p.name for p in output_path.iterdir() if p.is_file())
|
| 214 |
+
click.echo(f"Produced {len(artifacts)} artifacts:")
|
| 215 |
+
for name in artifacts:
|
| 216 |
+
click.echo(f" {name}")
|
| 217 |
+
|
| 218 |
+
|
| 219 |
+
# ---------------------------------------------------------------------------
|
| 220 |
+
# validate
|
| 221 |
+
# ---------------------------------------------------------------------------
|
| 222 |
+
|
| 223 |
+
# Canonical name -> check class. The order matches the 10-check pipeline.
|
| 224 |
+
_CHECK_REGISTRY: dict[str, str] = {
|
| 225 |
+
"build_boot": "open_range.validator.build_boot.BuildBootCheck",
|
| 226 |
+
"exploitability": "open_range.validator.exploitability.ExploitabilityCheck",
|
| 227 |
+
"patchability": "open_range.validator.patchability.PatchabilityCheck",
|
| 228 |
+
"evidence": "open_range.validator.evidence.EvidenceCheck",
|
| 229 |
+
"reward_grounding": "open_range.validator.reward_grounding.RewardGroundingCheck",
|
| 230 |
+
"isolation": "open_range.validator.isolation.IsolationCheck",
|
| 231 |
+
"task_feasibility": "open_range.validator.task_feasibility.TaskFeasibilityCheck",
|
| 232 |
+
"difficulty": "open_range.validator.difficulty.DifficultyCheck",
|
| 233 |
+
"npc_consistency": "open_range.validator.npc_consistency.NPCConsistencyCheck",
|
| 234 |
+
"realism_review": "open_range.validator.realism_review.RealismReviewCheck",
|
| 235 |
+
}
|
| 236 |
+
|
| 237 |
+
# Checks that require running Docker containers.
|
| 238 |
+
_DOCKER_CHECKS = {"build_boot", "exploitability", "patchability", "evidence"}
|
| 239 |
+
|
| 240 |
+
|
| 241 |
+
def _import_check(dotted: str) -> Any:
|
| 242 |
+
"""Import a check class by dotted path."""
|
| 243 |
+
module_path, class_name = dotted.rsplit(".", 1)
|
| 244 |
+
import importlib
|
| 245 |
+
|
| 246 |
+
mod = importlib.import_module(module_path)
|
| 247 |
+
return getattr(mod, class_name)
|
| 248 |
+
|
| 249 |
+
|
| 250 |
+
@cli.command()
|
| 251 |
+
@click.option("-s", "--snapshot", required=True, type=click.Path(exists=True), help="Path to snapshot JSON.")
|
| 252 |
+
@click.option("--checks", default=None, help="Comma-separated check names (default: all applicable).")
|
| 253 |
+
@click.option("--docker/--no-docker", default=False, help="Include Docker-dependent checks (requires running containers).")
|
| 254 |
+
def validate(snapshot: str, checks: str | None, docker: bool) -> None:
|
| 255 |
+
"""Run validator checks against a snapshot.
|
| 256 |
+
|
| 257 |
+
By default runs only offline checks (no Docker required). Use --docker
|
| 258 |
+
to include checks that need live containers.
|
| 259 |
+
|
| 260 |
+
Available checks: build_boot, exploitability, patchability, evidence,
|
| 261 |
+
reward_grounding, isolation, task_feasibility, difficulty,
|
| 262 |
+
npc_consistency, realism_review.
|
| 263 |
+
"""
|
| 264 |
+
from open_range.protocols import ContainerSet
|
| 265 |
+
from open_range.validator.validator import ValidatorGate
|
| 266 |
+
|
| 267 |
+
spec = _load_snapshot(snapshot)
|
| 268 |
+
|
| 269 |
+
# Determine which checks to run
|
| 270 |
+
if checks:
|
| 271 |
+
names = [n.strip() for n in checks.split(",")]
|
| 272 |
+
unknown = [n for n in names if n not in _CHECK_REGISTRY]
|
| 273 |
+
if unknown:
|
| 274 |
+
click.echo(f"Error: unknown checks: {', '.join(unknown)}", err=True)
|
| 275 |
+
click.echo(f"Available: {', '.join(_CHECK_REGISTRY)}", err=True)
|
| 276 |
+
sys.exit(1)
|
| 277 |
+
else:
|
| 278 |
+
if docker:
|
| 279 |
+
names = list(_CHECK_REGISTRY)
|
| 280 |
+
else:
|
| 281 |
+
names = [n for n in _CHECK_REGISTRY if n not in _DOCKER_CHECKS]
|
| 282 |
+
|
| 283 |
+
if not names:
|
| 284 |
+
click.echo("No checks selected.")
|
| 285 |
+
sys.exit(0)
|
| 286 |
+
|
| 287 |
+
# Instantiate checks
|
| 288 |
+
check_instances = []
|
| 289 |
+
for name in names:
|
| 290 |
+
cls = _import_check(_CHECK_REGISTRY[name])
|
| 291 |
+
check_instances.append(cls())
|
| 292 |
+
|
| 293 |
+
# Containers stub for offline mode, real discovery for docker mode
|
| 294 |
+
containers = ContainerSet()
|
| 295 |
+
|
| 296 |
+
gate = ValidatorGate(check_instances)
|
| 297 |
+
click.echo(f"Running {len(check_instances)} checks ...")
|
| 298 |
+
|
| 299 |
+
result = _run_async(gate.validate(spec, containers))
|
| 300 |
+
|
| 301 |
+
# Print results
|
| 302 |
+
for cr in result.checks:
|
| 303 |
+
status = "PASS" if cr.passed else ("ADVISORY" if cr.advisory else "FAIL")
|
| 304 |
+
line = f" [{status}] {cr.name}"
|
| 305 |
+
if cr.time_s > 0:
|
| 306 |
+
line += f" ({cr.time_s:.2f}s)"
|
| 307 |
+
click.echo(line)
|
| 308 |
+
if cr.error:
|
| 309 |
+
click.echo(f" {cr.error}")
|
| 310 |
+
|
| 311 |
+
click.echo("")
|
| 312 |
+
if result.passed:
|
| 313 |
+
click.echo(f"Validation PASSED ({result.total_time_s:.2f}s)")
|
| 314 |
+
else:
|
| 315 |
+
click.echo(f"Validation FAILED ({result.total_time_s:.2f}s)")
|
| 316 |
+
sys.exit(1)
|
| 317 |
+
|
| 318 |
+
|
| 319 |
+
# ---------------------------------------------------------------------------
|
| 320 |
+
# deploy
|
| 321 |
+
# ---------------------------------------------------------------------------
|
| 322 |
+
|
| 323 |
+
|
| 324 |
+
@cli.command()
|
| 325 |
+
@click.option("-s", "--snapshot", required=True, type=click.Path(exists=True), help="Path to snapshot JSON.")
|
| 326 |
+
@click.option("--compose-dir", default=None, type=click.Path(), help="Directory containing docker-compose.yml (default: render into temp dir).")
|
| 327 |
+
def deploy(snapshot: str, compose_dir: str | None) -> None:
|
| 328 |
+
"""Deploy a snapshot to running Docker containers.
|
| 329 |
+
|
| 330 |
+
Renders the snapshot into Docker artifacts and runs docker compose up.
|
| 331 |
+
If --compose-dir is given, uses that directory; otherwise renders into
|
| 332 |
+
a temporary directory alongside the snapshot.
|
| 333 |
+
"""
|
| 334 |
+
import subprocess
|
| 335 |
+
|
| 336 |
+
from open_range.builder.renderer import SnapshotRenderer
|
| 337 |
+
|
| 338 |
+
spec = _load_snapshot(snapshot)
|
| 339 |
+
|
| 340 |
+
if compose_dir:
|
| 341 |
+
target = Path(compose_dir)
|
| 342 |
+
else:
|
| 343 |
+
target = Path(snapshot).parent / "deploy"
|
| 344 |
+
|
| 345 |
+
# Render artifacts
|
| 346 |
+
renderer = SnapshotRenderer()
|
| 347 |
+
click.echo(f"Rendering Docker artifacts to {target} ...")
|
| 348 |
+
try:
|
| 349 |
+
renderer.render(spec, target)
|
| 350 |
+
except Exception as exc:
|
| 351 |
+
click.echo(f"Error: render failed: {exc}", err=True)
|
| 352 |
+
sys.exit(1)
|
| 353 |
+
|
| 354 |
+
compose_file = target / "docker-compose.yml"
|
| 355 |
+
if not compose_file.exists():
|
| 356 |
+
click.echo(f"Error: no docker-compose.yml found in {target}", err=True)
|
| 357 |
+
sys.exit(1)
|
| 358 |
+
|
| 359 |
+
click.echo("Starting containers with docker compose ...")
|
| 360 |
+
try:
|
| 361 |
+
proc = subprocess.run(
|
| 362 |
+
["docker", "compose", "-f", str(compose_file), "up", "-d", "--build"],
|
| 363 |
+
cwd=str(target),
|
| 364 |
+
capture_output=True,
|
| 365 |
+
text=True,
|
| 366 |
+
timeout=300,
|
| 367 |
+
)
|
| 368 |
+
except FileNotFoundError:
|
| 369 |
+
click.echo("Error: docker command not found. Is Docker installed and in PATH?", err=True)
|
| 370 |
+
sys.exit(1)
|
| 371 |
+
except subprocess.TimeoutExpired:
|
| 372 |
+
click.echo("Error: docker compose up timed out after 300s.", err=True)
|
| 373 |
+
sys.exit(1)
|
| 374 |
+
|
| 375 |
+
if proc.returncode != 0:
|
| 376 |
+
click.echo(f"Error: docker compose up failed (exit {proc.returncode}):", err=True)
|
| 377 |
+
if proc.stderr:
|
| 378 |
+
click.echo(proc.stderr, err=True)
|
| 379 |
+
sys.exit(1)
|
| 380 |
+
|
| 381 |
+
click.echo("Containers started.")
|
| 382 |
+
|
| 383 |
+
# Show running container status
|
| 384 |
+
try:
|
| 385 |
+
ps = subprocess.run(
|
| 386 |
+
["docker", "compose", "-f", str(compose_file), "ps", "--format", "table"],
|
| 387 |
+
cwd=str(target),
|
| 388 |
+
capture_output=True,
|
| 389 |
+
text=True,
|
| 390 |
+
timeout=30,
|
| 391 |
+
)
|
| 392 |
+
if ps.stdout:
|
| 393 |
+
click.echo(ps.stdout)
|
| 394 |
+
except Exception:
|
| 395 |
+
pass # Non-critical
|
| 396 |
+
|
| 397 |
+
|
| 398 |
+
# ---------------------------------------------------------------------------
|
| 399 |
+
# server
|
| 400 |
+
# ---------------------------------------------------------------------------
|
| 401 |
+
|
| 402 |
+
|
| 403 |
+
@cli.command()
|
| 404 |
+
@click.option("--host", default="0.0.0.0", help="Host to bind.")
|
| 405 |
+
@click.option("--port", default=8000, type=int, help="Port to listen on.")
|
| 406 |
+
@click.option("--mock/--no-mock", default=False, help="Use mock mode (no Docker required).")
|
| 407 |
+
def server(host: str, port: int, mock: bool) -> None:
|
| 408 |
+
"""Start the OpenEnv server.
|
| 409 |
+
|
| 410 |
+
In mock mode, the environment simulates container interactions without
|
| 411 |
+
requiring a running Docker stack.
|
| 412 |
+
"""
|
| 413 |
+
import uvicorn
|
| 414 |
+
|
| 415 |
+
if mock:
|
| 416 |
+
os.environ["OPENRANGE_MOCK"] = "1"
|
| 417 |
+
click.echo(f"Starting OpenRange server in MOCK mode on {host}:{port} ...")
|
| 418 |
+
else:
|
| 419 |
+
click.echo(f"Starting OpenRange server on {host}:{port} ...")
|
| 420 |
+
|
| 421 |
+
try:
|
| 422 |
+
uvicorn.run(
|
| 423 |
+
"open_range.server.app:app",
|
| 424 |
+
host=host,
|
| 425 |
+
port=port,
|
| 426 |
+
log_level="info",
|
| 427 |
+
)
|
| 428 |
+
except Exception as exc:
|
| 429 |
+
click.echo(f"Error: server failed: {exc}", err=True)
|
| 430 |
+
sys.exit(1)
|
| 431 |
+
|
| 432 |
+
|
| 433 |
+
# ---------------------------------------------------------------------------
|
| 434 |
+
# Entry point
|
| 435 |
+
# ---------------------------------------------------------------------------
|
| 436 |
+
|
| 437 |
+
if __name__ == "__main__":
|
| 438 |
+
cli()
|
src/open_range/client/client.py
CHANGED
|
@@ -1,9 +1,36 @@
|
|
| 1 |
-
"""Typed OpenEnv client for OpenRange.
|
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
from __future__ import annotations
|
| 4 |
|
| 5 |
-
from
|
| 6 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
|
| 8 |
from open_range.server.models import RangeAction, RangeObservation, RangeState
|
| 9 |
|
|
|
|
| 1 |
+
"""Typed OpenEnv client for OpenRange.
|
| 2 |
+
|
| 3 |
+
Falls back to lightweight stubs if openenv is not installed.
|
| 4 |
+
"""
|
| 5 |
|
| 6 |
from __future__ import annotations
|
| 7 |
|
| 8 |
+
from typing import Any, Generic, TypeVar
|
| 9 |
+
|
| 10 |
+
try:
|
| 11 |
+
from openenv.core.client_types import StepResult
|
| 12 |
+
from openenv.core.env_client import EnvClient
|
| 13 |
+
except ImportError:
|
| 14 |
+
from dataclasses import dataclass, field
|
| 15 |
+
|
| 16 |
+
_A = TypeVar("_A")
|
| 17 |
+
_O = TypeVar("_O")
|
| 18 |
+
_S = TypeVar("_S")
|
| 19 |
+
|
| 20 |
+
@dataclass
|
| 21 |
+
class StepResult(Generic[_O]): # type: ignore[no-redef]
|
| 22 |
+
"""Minimal stub matching openenv.core.client_types.StepResult."""
|
| 23 |
+
|
| 24 |
+
observation: Any = None
|
| 25 |
+
reward: float | int | None = None
|
| 26 |
+
done: bool = False
|
| 27 |
+
metadata: dict[str, Any] = field(default_factory=dict)
|
| 28 |
+
|
| 29 |
+
class EnvClient(Generic[_A, _O, _S]): # type: ignore[no-redef]
|
| 30 |
+
"""Minimal stub matching openenv.core.env_client.EnvClient."""
|
| 31 |
+
|
| 32 |
+
def __init__(self, *args: Any, **kwargs: Any) -> None:
|
| 33 |
+
pass
|
| 34 |
|
| 35 |
from open_range.server.models import RangeAction, RangeObservation, RangeState
|
| 36 |
|
src/open_range/server/Dockerfile
DELETED
|
@@ -1,44 +0,0 @@
|
|
| 1 |
-
FROM python:3.11-slim AS builder
|
| 2 |
-
|
| 3 |
-
WORKDIR /app
|
| 4 |
-
|
| 5 |
-
# Install uv for fast dependency resolution
|
| 6 |
-
RUN pip install --no-cache-dir uv
|
| 7 |
-
|
| 8 |
-
# Copy project files
|
| 9 |
-
COPY pyproject.toml uv.lock* ./
|
| 10 |
-
COPY src/ src/
|
| 11 |
-
COPY openenv.yaml .
|
| 12 |
-
COPY manifests/ manifests/
|
| 13 |
-
|
| 14 |
-
# Install dependencies
|
| 15 |
-
RUN uv sync --frozen --no-editable 2>/dev/null || uv sync --no-editable
|
| 16 |
-
|
| 17 |
-
# --- Runtime stage ---
|
| 18 |
-
FROM python:3.11-slim
|
| 19 |
-
|
| 20 |
-
WORKDIR /app
|
| 21 |
-
|
| 22 |
-
# Runtime system deps: Docker CLI (for controlling range containers) + curl
|
| 23 |
-
RUN apt-get update && \
|
| 24 |
-
apt-get install -y --no-install-recommends \
|
| 25 |
-
docker.io \
|
| 26 |
-
curl \
|
| 27 |
-
&& rm -rf /var/lib/apt/lists/*
|
| 28 |
-
|
| 29 |
-
COPY --from=builder /app/.venv /app/.venv
|
| 30 |
-
COPY --from=builder /app/src /app/src
|
| 31 |
-
COPY --from=builder /app/pyproject.toml /app/pyproject.toml
|
| 32 |
-
COPY --from=builder /app/openenv.yaml /app/openenv.yaml
|
| 33 |
-
COPY --from=builder /app/manifests /app/manifests
|
| 34 |
-
COPY server/ server/
|
| 35 |
-
|
| 36 |
-
ENV PATH="/app/.venv/bin:$PATH"
|
| 37 |
-
ENV PYTHONPATH="/app/src:$PYTHONPATH"
|
| 38 |
-
|
| 39 |
-
EXPOSE 8000
|
| 40 |
-
|
| 41 |
-
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
|
| 42 |
-
CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')" || exit 1
|
| 43 |
-
|
| 44 |
-
CMD ["uvicorn", "server.app:app", "--host", "0.0.0.0", "--port", "8000"]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
src/open_range/server/app.py
CHANGED
|
@@ -39,3 +39,6 @@ def main() -> None:
|
|
| 39 |
|
| 40 |
|
| 41 |
app = create_app()
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
|
| 40 |
|
| 41 |
app = create_app()
|
| 42 |
+
|
| 43 |
+
if __name__ == "__main__":
|
| 44 |
+
main()
|
src/open_range/server/environment.py
CHANGED
|
@@ -16,6 +16,7 @@ Design:
|
|
| 16 |
from __future__ import annotations
|
| 17 |
|
| 18 |
import logging
|
|
|
|
| 19 |
import time
|
| 20 |
from typing import TYPE_CHECKING, Any
|
| 21 |
from uuid import uuid4
|
|
@@ -248,11 +249,11 @@ class RangeEnvironment(_BASE): # type: ignore[misc]
|
|
| 248 |
|
| 249 |
parent_dir = path.rsplit("/", 1)[0] if "/" in path else "/"
|
| 250 |
self._exec_in_container(
|
| 251 |
-
container_name, f"mkdir -p
|
| 252 |
)
|
| 253 |
|
| 254 |
b64 = base64.b64encode(content.encode()).decode()
|
| 255 |
-
cmd = f"echo '{b64}' | base64 -d >
|
| 256 |
_, stderr = self._exec_in_container(container_name, cmd)
|
| 257 |
if stderr and "Error" in stderr:
|
| 258 |
logger.warning(
|
|
@@ -284,32 +285,41 @@ class RangeEnvironment(_BASE): # type: ignore[misc]
|
|
| 284 |
"""
|
| 285 |
if "snapshot" in kwargs and isinstance(kwargs["snapshot"], SnapshotSpec):
|
| 286 |
self._snapshot_id = kwargs.get("snapshot_id")
|
| 287 |
-
|
| 288 |
-
|
| 289 |
-
if self._runtime is not None:
|
| 290 |
if "snapshot_id" in kwargs and kwargs["snapshot_id"]:
|
| 291 |
admitted = self._runtime.get_snapshot(str(kwargs["snapshot_id"]))
|
| 292 |
else:
|
| 293 |
admitted = self._runtime.acquire_snapshot()
|
| 294 |
self._snapshot_id = admitted.snapshot_id
|
| 295 |
-
|
| 296 |
-
|
| 297 |
-
|
| 298 |
-
|
| 299 |
-
|
| 300 |
-
|
| 301 |
-
|
| 302 |
-
|
| 303 |
-
|
| 304 |
-
|
| 305 |
-
|
| 306 |
-
|
| 307 |
-
|
| 308 |
-
|
| 309 |
-
|
| 310 |
-
|
| 311 |
-
|
| 312 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 313 |
|
| 314 |
# -----------------------------------------------------------------
|
| 315 |
# Special command handling
|
|
@@ -328,13 +338,13 @@ class RangeEnvironment(_BASE): # type: ignore[misc]
|
|
| 328 |
done=True,
|
| 329 |
)
|
| 330 |
|
| 331 |
-
valid_flags = {f.value for f in self._snapshot.flags}
|
| 332 |
already_found = set(self._state.flags_found)
|
| 333 |
|
| 334 |
if submitted in valid_flags and submitted not in already_found:
|
| 335 |
self._state.flags_found.append(submitted)
|
| 336 |
# Check if all flags captured
|
| 337 |
-
all_captured = set(self._state.flags_found) == valid_flags
|
| 338 |
return RangeObservation(
|
| 339 |
stdout=f"Correct! Flag accepted: {submitted}",
|
| 340 |
flags_captured=[submitted],
|
|
@@ -395,7 +405,7 @@ class RangeEnvironment(_BASE): # type: ignore[misc]
|
|
| 395 |
Checks credentials against the topology user list in the snapshot.
|
| 396 |
Successful auth is recorded in ``state.active_sessions``.
|
| 397 |
"""
|
| 398 |
-
parts = action.command.strip().split()
|
| 399 |
if len(parts) < 4:
|
| 400 |
return RangeObservation(
|
| 401 |
stdout="",
|
|
@@ -615,8 +625,8 @@ class RangeEnvironment(_BASE): # type: ignore[misc]
|
|
| 615 |
"Episode %s reset: tier=%d, flags=%d, golden_path_steps=%d",
|
| 616 |
eid,
|
| 617 |
self._state.tier,
|
| 618 |
-
len(self._snapshot.flags),
|
| 619 |
-
len(self._snapshot.golden_path),
|
| 620 |
)
|
| 621 |
|
| 622 |
return RangeObservation(stdout=briefing)
|
|
@@ -774,7 +784,7 @@ class RangeEnvironment(_BASE): # type: ignore[misc]
|
|
| 774 |
action, obs, self._state, self._snapshot, reward_ctx
|
| 775 |
)
|
| 776 |
except Exception as exc:
|
| 777 |
-
logger.
|
| 778 |
obs.reward = 0.0
|
| 779 |
|
| 780 |
return obs
|
|
|
|
| 16 |
from __future__ import annotations
|
| 17 |
|
| 18 |
import logging
|
| 19 |
+
import shlex
|
| 20 |
import time
|
| 21 |
from typing import TYPE_CHECKING, Any
|
| 22 |
from uuid import uuid4
|
|
|
|
| 249 |
|
| 250 |
parent_dir = path.rsplit("/", 1)[0] if "/" in path else "/"
|
| 251 |
self._exec_in_container(
|
| 252 |
+
container_name, f"mkdir -p {shlex.quote(parent_dir)}"
|
| 253 |
)
|
| 254 |
|
| 255 |
b64 = base64.b64encode(content.encode()).decode()
|
| 256 |
+
cmd = f"echo '{b64}' | base64 -d > {shlex.quote(path)}"
|
| 257 |
_, stderr = self._exec_in_container(container_name, cmd)
|
| 258 |
if stderr and "Error" in stderr:
|
| 259 |
logger.warning(
|
|
|
|
| 285 |
"""
|
| 286 |
if "snapshot" in kwargs and isinstance(kwargs["snapshot"], SnapshotSpec):
|
| 287 |
self._snapshot_id = kwargs.get("snapshot_id")
|
| 288 |
+
snap = kwargs["snapshot"]
|
| 289 |
+
elif self._runtime is not None:
|
|
|
|
| 290 |
if "snapshot_id" in kwargs and kwargs["snapshot_id"]:
|
| 291 |
admitted = self._runtime.get_snapshot(str(kwargs["snapshot_id"]))
|
| 292 |
else:
|
| 293 |
admitted = self._runtime.acquire_snapshot()
|
| 294 |
self._snapshot_id = admitted.snapshot_id
|
| 295 |
+
snap = admitted.snapshot
|
| 296 |
+
else:
|
| 297 |
+
self._snapshot_id = None
|
| 298 |
+
snap = SnapshotSpec(
|
| 299 |
+
topology={"hosts": []},
|
| 300 |
+
flags=[],
|
| 301 |
+
golden_path=[],
|
| 302 |
+
task={
|
| 303 |
+
"red_briefing": (
|
| 304 |
+
"Target network detected. Begin reconnaissance and "
|
| 305 |
+
"identify vulnerabilities. Capture all flags."
|
| 306 |
+
),
|
| 307 |
+
"blue_briefing": (
|
| 308 |
+
"Monitor SIEM for suspicious activity. Investigate "
|
| 309 |
+
"alerts, patch vulnerabilities, and report findings."
|
| 310 |
+
),
|
| 311 |
+
},
|
| 312 |
+
)
|
| 313 |
+
|
| 314 |
+
# Defensive: ensure required fields are not None
|
| 315 |
+
if snap.flags is None:
|
| 316 |
+
snap.flags = []
|
| 317 |
+
if snap.topology is None:
|
| 318 |
+
snap.topology = {}
|
| 319 |
+
if snap.task is None:
|
| 320 |
+
snap.task = {}
|
| 321 |
+
|
| 322 |
+
return snap
|
| 323 |
|
| 324 |
# -----------------------------------------------------------------
|
| 325 |
# Special command handling
|
|
|
|
| 338 |
done=True,
|
| 339 |
)
|
| 340 |
|
| 341 |
+
valid_flags = {f.value for f in self._snapshot.flags} if self._snapshot.flags else set()
|
| 342 |
already_found = set(self._state.flags_found)
|
| 343 |
|
| 344 |
if submitted in valid_flags and submitted not in already_found:
|
| 345 |
self._state.flags_found.append(submitted)
|
| 346 |
# Check if all flags captured
|
| 347 |
+
all_captured = valid_flags and set(self._state.flags_found) == valid_flags
|
| 348 |
return RangeObservation(
|
| 349 |
stdout=f"Correct! Flag accepted: {submitted}",
|
| 350 |
flags_captured=[submitted],
|
|
|
|
| 405 |
Checks credentials against the topology user list in the snapshot.
|
| 406 |
Successful auth is recorded in ``state.active_sessions``.
|
| 407 |
"""
|
| 408 |
+
parts = action.command.strip().split(maxsplit=3)
|
| 409 |
if len(parts) < 4:
|
| 410 |
return RangeObservation(
|
| 411 |
stdout="",
|
|
|
|
| 625 |
"Episode %s reset: tier=%d, flags=%d, golden_path_steps=%d",
|
| 626 |
eid,
|
| 627 |
self._state.tier,
|
| 628 |
+
len(self._snapshot.flags or []),
|
| 629 |
+
len(self._snapshot.golden_path or []),
|
| 630 |
)
|
| 631 |
|
| 632 |
return RangeObservation(stdout=briefing)
|
|
|
|
| 784 |
action, obs, self._state, self._snapshot, reward_ctx
|
| 785 |
)
|
| 786 |
except Exception as exc:
|
| 787 |
+
logger.error("Reward computation failed: %s", exc, exc_info=True)
|
| 788 |
obs.reward = 0.0
|
| 789 |
|
| 790 |
return obs
|
src/open_range/training/rollout.py
CHANGED
|
@@ -14,7 +14,7 @@ Usage with GRPOTrainer::
|
|
| 14 |
|
| 15 |
from __future__ import annotations
|
| 16 |
|
| 17 |
-
from typing import Any,
|
| 18 |
|
| 19 |
|
| 20 |
class AgentCallable(Protocol):
|
|
@@ -23,7 +23,7 @@ class AgentCallable(Protocol):
|
|
| 23 |
def __call__(self, observation: Any) -> Any: ...
|
| 24 |
|
| 25 |
|
| 26 |
-
|
| 27 |
env: Any,
|
| 28 |
agent: AgentCallable,
|
| 29 |
num_steps: int = 100,
|
|
@@ -82,10 +82,8 @@ def rollout_func_sync(
|
|
| 82 |
num_steps: int = 100,
|
| 83 |
mode: str = "red",
|
| 84 |
) -> dict[str, Any]:
|
| 85 |
-
"""Synchronous wrapper
|
| 86 |
|
| 87 |
-
|
| 88 |
"""
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
return asyncio.run(rollout_func(env, agent, num_steps, mode))
|
|
|
|
| 14 |
|
| 15 |
from __future__ import annotations
|
| 16 |
|
| 17 |
+
from typing import Any, Protocol
|
| 18 |
|
| 19 |
|
| 20 |
class AgentCallable(Protocol):
|
|
|
|
| 23 |
def __call__(self, observation: Any) -> Any: ...
|
| 24 |
|
| 25 |
|
| 26 |
+
def rollout_func(
|
| 27 |
env: Any,
|
| 28 |
agent: AgentCallable,
|
| 29 |
num_steps: int = 100,
|
|
|
|
| 82 |
num_steps: int = 100,
|
| 83 |
mode: str = "red",
|
| 84 |
) -> dict[str, Any]:
|
| 85 |
+
"""Synchronous wrapper — now just delegates to rollout_func directly.
|
| 86 |
|
| 87 |
+
Kept for backward compatibility with callers that import this name.
|
| 88 |
"""
|
| 89 |
+
return rollout_func(env, agent, num_steps, mode)
|
|
|
|
|
|
tests/test_apply_snapshot.py
ADDED
|
@@ -0,0 +1,457 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Tests for RangeEnvironment._apply_snapshot() with mocked Docker.
|
| 2 |
+
|
| 3 |
+
Covers file deployment via docker exec (base64 encoding), SQL execution,
|
| 4 |
+
container name resolution, error handling, and mixed files dicts.
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
from __future__ import annotations
|
| 8 |
+
|
| 9 |
+
import base64
|
| 10 |
+
from unittest.mock import MagicMock, call, patch
|
| 11 |
+
|
| 12 |
+
import pytest
|
| 13 |
+
|
| 14 |
+
from open_range.protocols import (
|
| 15 |
+
FlagSpec,
|
| 16 |
+
SnapshotSpec,
|
| 17 |
+
TruthGraph,
|
| 18 |
+
Vulnerability,
|
| 19 |
+
)
|
| 20 |
+
from open_range.server.environment import RangeEnvironment
|
| 21 |
+
|
| 22 |
+
|
| 23 |
+
# ---------------------------------------------------------------------------
|
| 24 |
+
# Helpers
|
| 25 |
+
# ---------------------------------------------------------------------------
|
| 26 |
+
|
| 27 |
+
|
| 28 |
+
def _make_env(docker_available: bool = True) -> RangeEnvironment:
|
| 29 |
+
"""Create a RangeEnvironment with docker_available control."""
|
| 30 |
+
return RangeEnvironment(docker_available=docker_available)
|
| 31 |
+
|
| 32 |
+
|
| 33 |
+
def _make_snapshot(files: dict[str, str] | None = None) -> SnapshotSpec:
|
| 34 |
+
"""Create a minimal SnapshotSpec with the given files dict."""
|
| 35 |
+
return SnapshotSpec(
|
| 36 |
+
topology={"hosts": ["web", "db"], "zones": {"dmz": ["web"], "internal": ["db"]}},
|
| 37 |
+
truth_graph=TruthGraph(vulns=[]),
|
| 38 |
+
flags=[],
|
| 39 |
+
golden_path=[],
|
| 40 |
+
files=files or {},
|
| 41 |
+
)
|
| 42 |
+
|
| 43 |
+
|
| 44 |
+
class _FakeExecResult:
|
| 45 |
+
"""Mimics docker SDK exec_run return value."""
|
| 46 |
+
|
| 47 |
+
def __init__(self, stdout: bytes = b"", stderr: bytes = b""):
|
| 48 |
+
self.output = (stdout, stderr)
|
| 49 |
+
|
| 50 |
+
|
| 51 |
+
class _FakeContainer:
|
| 52 |
+
"""Minimal fake Docker container."""
|
| 53 |
+
|
| 54 |
+
def __init__(self, name: str, exec_side_effect=None):
|
| 55 |
+
self.name = name
|
| 56 |
+
self._exec_side_effect = exec_side_effect or (lambda *a, **kw: _FakeExecResult())
|
| 57 |
+
|
| 58 |
+
def exec_run(self, cmd, **kwargs):
|
| 59 |
+
return self._exec_side_effect(cmd, **kwargs)
|
| 60 |
+
|
| 61 |
+
|
| 62 |
+
class _FakeDockerClient:
|
| 63 |
+
"""Minimal fake Docker client."""
|
| 64 |
+
|
| 65 |
+
def __init__(self, containers: dict[str, _FakeContainer] | None = None):
|
| 66 |
+
self._containers = containers or {}
|
| 67 |
+
|
| 68 |
+
@property
|
| 69 |
+
def containers(self):
|
| 70 |
+
return self
|
| 71 |
+
|
| 72 |
+
def get(self, name: str):
|
| 73 |
+
if name in self._containers:
|
| 74 |
+
return self._containers[name]
|
| 75 |
+
raise Exception(f"Container {name} not found")
|
| 76 |
+
|
| 77 |
+
def list(self):
|
| 78 |
+
return list(self._containers.values())
|
| 79 |
+
|
| 80 |
+
|
| 81 |
+
# ---------------------------------------------------------------------------
|
| 82 |
+
# Tests: Docker unavailable
|
| 83 |
+
# ---------------------------------------------------------------------------
|
| 84 |
+
|
| 85 |
+
|
| 86 |
+
class TestApplySnapshotNoDocker:
|
| 87 |
+
"""When Docker is not available, _apply_snapshot should be a no-op."""
|
| 88 |
+
|
| 89 |
+
def test_skips_when_docker_unavailable(self):
|
| 90 |
+
env = _make_env(docker_available=False)
|
| 91 |
+
snapshot = _make_snapshot({"web:/var/www/test.php": "<?php echo 1; ?>"})
|
| 92 |
+
# Should not raise
|
| 93 |
+
env._apply_snapshot(snapshot)
|
| 94 |
+
|
| 95 |
+
def test_skips_when_no_files(self):
|
| 96 |
+
env = _make_env(docker_available=False)
|
| 97 |
+
snapshot = _make_snapshot({})
|
| 98 |
+
env._apply_snapshot(snapshot)
|
| 99 |
+
|
| 100 |
+
def test_skips_when_files_is_none(self):
|
| 101 |
+
env = _make_env(docker_available=False)
|
| 102 |
+
snapshot = _make_snapshot()
|
| 103 |
+
snapshot.files = {}
|
| 104 |
+
env._apply_snapshot(snapshot)
|
| 105 |
+
|
| 106 |
+
|
| 107 |
+
# ---------------------------------------------------------------------------
|
| 108 |
+
# Tests: File deployment via base64
|
| 109 |
+
# ---------------------------------------------------------------------------
|
| 110 |
+
|
| 111 |
+
|
| 112 |
+
class TestFileDeployment:
|
| 113 |
+
"""Verify files are deployed to containers via base64-encoded docker exec."""
|
| 114 |
+
|
| 115 |
+
def test_deploys_single_file(self):
|
| 116 |
+
env = _make_env(docker_available=True)
|
| 117 |
+
content = "<?php echo 'hello'; ?>"
|
| 118 |
+
snapshot = _make_snapshot({"web:/var/www/portal/test.php": content})
|
| 119 |
+
|
| 120 |
+
exec_calls = []
|
| 121 |
+
|
| 122 |
+
def fake_exec_run(cmd, **kw):
|
| 123 |
+
exec_calls.append(cmd)
|
| 124 |
+
return _FakeExecResult()
|
| 125 |
+
|
| 126 |
+
container = _FakeContainer("web", exec_side_effect=fake_exec_run)
|
| 127 |
+
client = _FakeDockerClient({"web": container})
|
| 128 |
+
env._docker_client = client
|
| 129 |
+
env._docker_available = True
|
| 130 |
+
|
| 131 |
+
env._apply_snapshot(snapshot)
|
| 132 |
+
|
| 133 |
+
# Should have 2 calls: mkdir -p, then echo base64 | base64 -d > path
|
| 134 |
+
assert len(exec_calls) == 2
|
| 135 |
+
# First call: mkdir -p for parent directory
|
| 136 |
+
mkdir_cmd = exec_calls[0]
|
| 137 |
+
assert mkdir_cmd == ["sh", "-c", "mkdir -p '/var/www/portal'"]
|
| 138 |
+
# Second call: base64 write
|
| 139 |
+
write_cmd = exec_calls[1]
|
| 140 |
+
assert isinstance(write_cmd, list)
|
| 141 |
+
write_str = write_cmd[2] if len(write_cmd) > 2 else ""
|
| 142 |
+
expected_b64 = base64.b64encode(content.encode()).decode()
|
| 143 |
+
assert expected_b64 in write_str
|
| 144 |
+
assert "/var/www/portal/test.php" in write_str
|
| 145 |
+
|
| 146 |
+
def test_deploys_multiple_files_to_different_containers(self):
|
| 147 |
+
env = _make_env(docker_available=True)
|
| 148 |
+
snapshot = _make_snapshot({
|
| 149 |
+
"web:/var/www/portal/index.php": "<?php echo 'web'; ?>",
|
| 150 |
+
"files:/srv/shares/general/notes.txt": "some notes",
|
| 151 |
+
})
|
| 152 |
+
|
| 153 |
+
web_calls = []
|
| 154 |
+
files_calls = []
|
| 155 |
+
|
| 156 |
+
web = _FakeContainer(
|
| 157 |
+
"web",
|
| 158 |
+
exec_side_effect=lambda cmd, **kw: (web_calls.append(cmd), _FakeExecResult())[1],
|
| 159 |
+
)
|
| 160 |
+
files_container = _FakeContainer(
|
| 161 |
+
"files",
|
| 162 |
+
exec_side_effect=lambda cmd, **kw: (files_calls.append(cmd), _FakeExecResult())[1],
|
| 163 |
+
)
|
| 164 |
+
client = _FakeDockerClient({"web": web, "files": files_container})
|
| 165 |
+
env._docker_client = client
|
| 166 |
+
env._docker_available = True
|
| 167 |
+
|
| 168 |
+
env._apply_snapshot(snapshot)
|
| 169 |
+
|
| 170 |
+
# web: 2 calls (mkdir + write)
|
| 171 |
+
assert len(web_calls) == 2
|
| 172 |
+
# files: 2 calls (mkdir + write)
|
| 173 |
+
assert len(files_calls) == 2
|
| 174 |
+
|
| 175 |
+
def test_file_at_root_path(self):
|
| 176 |
+
"""File at / should still work (edge case for parent dir)."""
|
| 177 |
+
env = _make_env(docker_available=True)
|
| 178 |
+
snapshot = _make_snapshot({"web:/test.txt": "root file"})
|
| 179 |
+
|
| 180 |
+
calls = []
|
| 181 |
+
container = _FakeContainer(
|
| 182 |
+
"web",
|
| 183 |
+
exec_side_effect=lambda cmd, **kw: (calls.append(cmd), _FakeExecResult())[1],
|
| 184 |
+
)
|
| 185 |
+
client = _FakeDockerClient({"web": container})
|
| 186 |
+
env._docker_client = client
|
| 187 |
+
env._docker_available = True
|
| 188 |
+
|
| 189 |
+
env._apply_snapshot(snapshot)
|
| 190 |
+
|
| 191 |
+
# mkdir -p for "/" then base64 write
|
| 192 |
+
assert len(calls) == 2
|
| 193 |
+
|
| 194 |
+
|
| 195 |
+
# ---------------------------------------------------------------------------
|
| 196 |
+
# Tests: SQL execution via docker exec
|
| 197 |
+
# ---------------------------------------------------------------------------
|
| 198 |
+
|
| 199 |
+
|
| 200 |
+
class TestSQLDeployment:
|
| 201 |
+
"""Verify db:sql entries are deployed via mysql commands."""
|
| 202 |
+
|
| 203 |
+
def test_deploys_sql_to_db_container(self):
|
| 204 |
+
env = _make_env(docker_available=True)
|
| 205 |
+
sql = "INSERT INTO users VALUES (1, 'test');"
|
| 206 |
+
snapshot = _make_snapshot({"db:sql": sql})
|
| 207 |
+
|
| 208 |
+
calls = []
|
| 209 |
+
|
| 210 |
+
def fake_exec(cmd, **kw):
|
| 211 |
+
calls.append(cmd)
|
| 212 |
+
return _FakeExecResult()
|
| 213 |
+
|
| 214 |
+
db_container = _FakeContainer("db", exec_side_effect=fake_exec)
|
| 215 |
+
client = _FakeDockerClient({"db": db_container})
|
| 216 |
+
env._docker_client = client
|
| 217 |
+
env._docker_available = True
|
| 218 |
+
|
| 219 |
+
env._apply_snapshot(snapshot)
|
| 220 |
+
|
| 221 |
+
# 3 calls: write SQL file, execute mysql, cleanup
|
| 222 |
+
assert len(calls) == 3
|
| 223 |
+
|
| 224 |
+
# First: base64 decode to /tmp/_snapshot.sql
|
| 225 |
+
write_cmd_str = calls[0][2] if len(calls[0]) > 2 else ""
|
| 226 |
+
expected_b64 = base64.b64encode(sql.encode()).decode()
|
| 227 |
+
assert expected_b64 in write_cmd_str
|
| 228 |
+
assert "/tmp/_snapshot.sql" in write_cmd_str
|
| 229 |
+
|
| 230 |
+
# Second: mysql < /tmp/_snapshot.sql
|
| 231 |
+
mysql_cmd_str = calls[1][2] if len(calls[1]) > 2 else ""
|
| 232 |
+
assert "mysql" in mysql_cmd_str
|
| 233 |
+
assert "/tmp/_snapshot.sql" in mysql_cmd_str
|
| 234 |
+
|
| 235 |
+
# Third: rm -f /tmp/_snapshot.sql
|
| 236 |
+
rm_cmd_str = calls[2][2] if len(calls[2]) > 2 else ""
|
| 237 |
+
assert "rm" in rm_cmd_str
|
| 238 |
+
assert "/tmp/_snapshot.sql" in rm_cmd_str
|
| 239 |
+
|
| 240 |
+
def test_sql_error_logs_warning(self, caplog):
|
| 241 |
+
"""When mysql returns an ERROR, it should log a warning but not raise."""
|
| 242 |
+
env = _make_env(docker_available=True)
|
| 243 |
+
snapshot = _make_snapshot({"db:sql": "INVALID SQL;"})
|
| 244 |
+
|
| 245 |
+
call_count = [0]
|
| 246 |
+
|
| 247 |
+
def fake_exec(cmd, **kw):
|
| 248 |
+
call_count[0] += 1
|
| 249 |
+
# Return ERROR on the mysql command (2nd call)
|
| 250 |
+
if call_count[0] == 2:
|
| 251 |
+
return _FakeExecResult(stderr=b"ERROR 1064: Syntax error")
|
| 252 |
+
return _FakeExecResult()
|
| 253 |
+
|
| 254 |
+
db_container = _FakeContainer("db", exec_side_effect=fake_exec)
|
| 255 |
+
client = _FakeDockerClient({"db": db_container})
|
| 256 |
+
env._docker_client = client
|
| 257 |
+
env._docker_available = True
|
| 258 |
+
|
| 259 |
+
import logging
|
| 260 |
+
with caplog.at_level(logging.WARNING):
|
| 261 |
+
env._apply_snapshot(snapshot)
|
| 262 |
+
|
| 263 |
+
assert any("SQL deployment error" in r.message for r in caplog.records)
|
| 264 |
+
|
| 265 |
+
|
| 266 |
+
# ---------------------------------------------------------------------------
|
| 267 |
+
# Tests: Container name resolution
|
| 268 |
+
# ---------------------------------------------------------------------------
|
| 269 |
+
|
| 270 |
+
|
| 271 |
+
class TestContainerNameResolution:
|
| 272 |
+
"""Verify _container_name resolves hosts correctly."""
|
| 273 |
+
|
| 274 |
+
def test_resolves_via_compose_config(self):
|
| 275 |
+
env = _make_env(docker_available=False)
|
| 276 |
+
env._snapshot = SnapshotSpec(
|
| 277 |
+
topology={},
|
| 278 |
+
compose={
|
| 279 |
+
"services": {"web": {}, "db": {}},
|
| 280 |
+
"x-project-name": "openrange",
|
| 281 |
+
},
|
| 282 |
+
)
|
| 283 |
+
assert env._container_name("web") == "openrange-web-1"
|
| 284 |
+
assert env._container_name("db") == "openrange-db-1"
|
| 285 |
+
|
| 286 |
+
def test_resolves_via_docker_listing(self):
|
| 287 |
+
env = _make_env(docker_available=True)
|
| 288 |
+
env._snapshot = None # No compose config
|
| 289 |
+
|
| 290 |
+
web_container = MagicMock()
|
| 291 |
+
web_container.name = "open-range-web-1"
|
| 292 |
+
db_container = MagicMock()
|
| 293 |
+
db_container.name = "open-range-db-1"
|
| 294 |
+
|
| 295 |
+
client = MagicMock()
|
| 296 |
+
client.containers.list.return_value = [web_container, db_container]
|
| 297 |
+
env._docker_client = client
|
| 298 |
+
|
| 299 |
+
assert env._container_name("web") == "open-range-web-1"
|
| 300 |
+
assert env._container_name("db") == "open-range-db-1"
|
| 301 |
+
|
| 302 |
+
def test_falls_back_to_bare_name(self):
|
| 303 |
+
env = _make_env(docker_available=False)
|
| 304 |
+
env._snapshot = None
|
| 305 |
+
assert env._container_name("web") == "web"
|
| 306 |
+
|
| 307 |
+
|
| 308 |
+
# ---------------------------------------------------------------------------
|
| 309 |
+
# Tests: Error handling for failed docker exec
|
| 310 |
+
# ---------------------------------------------------------------------------
|
| 311 |
+
|
| 312 |
+
|
| 313 |
+
class TestErrorHandling:
|
| 314 |
+
"""Verify graceful handling of docker exec failures."""
|
| 315 |
+
|
| 316 |
+
def test_file_deployment_handles_exception(self, caplog):
|
| 317 |
+
"""If docker exec raises, log warning but continue."""
|
| 318 |
+
env = _make_env(docker_available=True)
|
| 319 |
+
snapshot = _make_snapshot({
|
| 320 |
+
"web:/var/www/good.php": "good",
|
| 321 |
+
"broken:/var/www/fail.php": "bad",
|
| 322 |
+
})
|
| 323 |
+
|
| 324 |
+
def fake_exec(cmd, **kw):
|
| 325 |
+
return _FakeExecResult()
|
| 326 |
+
|
| 327 |
+
web = _FakeContainer("web", exec_side_effect=fake_exec)
|
| 328 |
+
# 'broken' container doesn't exist
|
| 329 |
+
client = _FakeDockerClient({"web": web})
|
| 330 |
+
env._docker_client = client
|
| 331 |
+
env._docker_available = True
|
| 332 |
+
|
| 333 |
+
import logging
|
| 334 |
+
with caplog.at_level(logging.WARNING):
|
| 335 |
+
env._apply_snapshot(snapshot)
|
| 336 |
+
|
| 337 |
+
# Should deploy the good file and warn about the broken one
|
| 338 |
+
assert any("Failed to deploy" in r.message or "broken" in r.message
|
| 339 |
+
for r in caplog.records)
|
| 340 |
+
|
| 341 |
+
def test_bad_key_format_skipped(self, caplog):
|
| 342 |
+
"""Keys without ':' separator should be skipped with a warning."""
|
| 343 |
+
env = _make_env(docker_available=True)
|
| 344 |
+
snapshot = _make_snapshot({
|
| 345 |
+
"no_colon_here": "this should be skipped",
|
| 346 |
+
"web:/var/www/valid.php": "valid content",
|
| 347 |
+
})
|
| 348 |
+
|
| 349 |
+
calls = []
|
| 350 |
+
web = _FakeContainer(
|
| 351 |
+
"web",
|
| 352 |
+
exec_side_effect=lambda cmd, **kw: (calls.append(cmd), _FakeExecResult())[1],
|
| 353 |
+
)
|
| 354 |
+
client = _FakeDockerClient({"web": web})
|
| 355 |
+
env._docker_client = client
|
| 356 |
+
env._docker_available = True
|
| 357 |
+
|
| 358 |
+
import logging
|
| 359 |
+
with caplog.at_level(logging.WARNING):
|
| 360 |
+
env._apply_snapshot(snapshot)
|
| 361 |
+
|
| 362 |
+
assert any("bad key format" in r.message for r in caplog.records)
|
| 363 |
+
# Only valid file should be deployed (mkdir + write = 2 calls)
|
| 364 |
+
assert len(calls) == 2
|
| 365 |
+
|
| 366 |
+
def test_file_write_stderr_error_logged(self, caplog):
|
| 367 |
+
"""If file write returns stderr with 'Error', log warning."""
|
| 368 |
+
env = _make_env(docker_available=True)
|
| 369 |
+
snapshot = _make_snapshot({"web:/var/www/fail.php": "content"})
|
| 370 |
+
|
| 371 |
+
call_count = [0]
|
| 372 |
+
|
| 373 |
+
def fake_exec(cmd, **kw):
|
| 374 |
+
call_count[0] += 1
|
| 375 |
+
# Return error on the write call (2nd call)
|
| 376 |
+
if call_count[0] == 2:
|
| 377 |
+
return _FakeExecResult(stderr=b"Error: permission denied")
|
| 378 |
+
return _FakeExecResult()
|
| 379 |
+
|
| 380 |
+
web = _FakeContainer("web", exec_side_effect=fake_exec)
|
| 381 |
+
client = _FakeDockerClient({"web": web})
|
| 382 |
+
env._docker_client = client
|
| 383 |
+
env._docker_available = True
|
| 384 |
+
|
| 385 |
+
import logging
|
| 386 |
+
with caplog.at_level(logging.WARNING):
|
| 387 |
+
env._apply_snapshot(snapshot)
|
| 388 |
+
|
| 389 |
+
assert any("File deployment error" in r.message for r in caplog.records)
|
| 390 |
+
|
| 391 |
+
|
| 392 |
+
# ---------------------------------------------------------------------------
|
| 393 |
+
# Tests: Mixed files dict (file paths + db:sql entries)
|
| 394 |
+
# ---------------------------------------------------------------------------
|
| 395 |
+
|
| 396 |
+
|
| 397 |
+
class TestMixedFilesDict:
|
| 398 |
+
"""Test snapshot with both regular file deployments and db:sql entries."""
|
| 399 |
+
|
| 400 |
+
def test_mixed_deployment(self):
|
| 401 |
+
env = _make_env(docker_available=True)
|
| 402 |
+
snapshot = _make_snapshot({
|
| 403 |
+
"web:/var/www/portal/index.php": "<?php echo 'hello'; ?>",
|
| 404 |
+
"web:/etc/nginx/sites-available/default": "server { listen 80; }",
|
| 405 |
+
"db:sql": "INSERT INTO secrets VALUES ('flag', 'FLAG{test}');",
|
| 406 |
+
"files:/srv/shares/general/notes.txt": "meeting notes",
|
| 407 |
+
})
|
| 408 |
+
|
| 409 |
+
container_calls: dict[str, list] = {"web": [], "db": [], "files": []}
|
| 410 |
+
|
| 411 |
+
def make_exec(name):
|
| 412 |
+
def fake_exec(cmd, **kw):
|
| 413 |
+
container_calls[name].append(cmd)
|
| 414 |
+
return _FakeExecResult()
|
| 415 |
+
return fake_exec
|
| 416 |
+
|
| 417 |
+
containers = {
|
| 418 |
+
name: _FakeContainer(name, exec_side_effect=make_exec(name))
|
| 419 |
+
for name in ["web", "db", "files"]
|
| 420 |
+
}
|
| 421 |
+
client = _FakeDockerClient(containers)
|
| 422 |
+
env._docker_client = client
|
| 423 |
+
env._docker_available = True
|
| 424 |
+
|
| 425 |
+
env._apply_snapshot(snapshot)
|
| 426 |
+
|
| 427 |
+
# web: 2 files * 2 calls each = 4
|
| 428 |
+
assert len(container_calls["web"]) == 4
|
| 429 |
+
# db: 3 calls (write sql, execute, cleanup)
|
| 430 |
+
assert len(container_calls["db"]) == 3
|
| 431 |
+
# files: 1 file * 2 calls = 2
|
| 432 |
+
assert len(container_calls["files"]) == 2
|
| 433 |
+
|
| 434 |
+
def test_deployment_count_in_log(self, caplog):
|
| 435 |
+
"""Verify the final log message reports correct deployment counts."""
|
| 436 |
+
env = _make_env(docker_available=True)
|
| 437 |
+
snapshot = _make_snapshot({
|
| 438 |
+
"web:/var/www/test.php": "test",
|
| 439 |
+
"db:sql": "SELECT 1;",
|
| 440 |
+
})
|
| 441 |
+
|
| 442 |
+
def fake_exec(cmd, **kw):
|
| 443 |
+
return _FakeExecResult()
|
| 444 |
+
|
| 445 |
+
containers = {
|
| 446 |
+
name: _FakeContainer(name, exec_side_effect=fake_exec)
|
| 447 |
+
for name in ["web", "db"]
|
| 448 |
+
}
|
| 449 |
+
client = _FakeDockerClient(containers)
|
| 450 |
+
env._docker_client = client
|
| 451 |
+
env._docker_available = True
|
| 452 |
+
|
| 453 |
+
import logging
|
| 454 |
+
with caplog.at_level(logging.INFO):
|
| 455 |
+
env._apply_snapshot(snapshot)
|
| 456 |
+
|
| 457 |
+
assert any("2/2 artifacts deployed" in r.message for r in caplog.records)
|
tests/test_console.py
CHANGED
|
@@ -1,7 +1,11 @@
|
|
| 1 |
"""Tests for the operator debugging console (issue #28).
|
| 2 |
|
| 3 |
-
Uses Starlette's TestClient against the
|
| 4 |
No Docker dependency.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
"""
|
| 6 |
|
| 7 |
from __future__ import annotations
|
|
@@ -10,17 +14,27 @@ import pytest
|
|
| 10 |
from starlette.testclient import TestClient
|
| 11 |
|
| 12 |
from open_range.server.app import create_app
|
|
|
|
|
|
|
| 13 |
|
| 14 |
|
| 15 |
@pytest.fixture()
|
| 16 |
-
def client(
|
| 17 |
-
"""Create a TestClient
|
| 18 |
-
# Force standalone path so we test our own endpoints and console integration
|
| 19 |
-
monkeypatch.setattr("open_range.server.app._try_openenv_app", lambda: None)
|
| 20 |
app = create_app()
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
return TestClient(app)
|
| 22 |
|
| 23 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
# ===================================================================
|
| 25 |
# GET /console -- HTML page
|
| 26 |
# ===================================================================
|
|
@@ -59,8 +73,8 @@ class TestSnapshotAPI:
|
|
| 59 |
data = client.get("/console/api/snapshot").json()
|
| 60 |
assert data["id"] is None
|
| 61 |
|
| 62 |
-
def test_snapshot_after_reset(self, client: TestClient):
|
| 63 |
-
|
| 64 |
data = client.get("/console/api/snapshot").json()
|
| 65 |
assert data["id"] == "snap_test_1"
|
| 66 |
assert "hosts" in data
|
|
@@ -68,9 +82,9 @@ class TestSnapshotAPI:
|
|
| 68 |
assert "vuln_count" in data
|
| 69 |
assert "tier" in data
|
| 70 |
|
| 71 |
-
def test_snapshot_no_truth_graph_or_flags(self, client: TestClient):
|
| 72 |
"""Snapshot API must NOT leak truth_graph or flag values."""
|
| 73 |
-
|
| 74 |
data = client.get("/console/api/snapshot").json()
|
| 75 |
assert "truth_graph" not in data
|
| 76 |
assert "flags" not in data
|
|
@@ -89,20 +103,22 @@ class TestEpisodeAPI:
|
|
| 89 |
data = resp.json()
|
| 90 |
assert isinstance(data, dict)
|
| 91 |
|
| 92 |
-
def test_episode_fields(self, client: TestClient):
|
| 93 |
-
|
| 94 |
data = client.get("/console/api/episode").json()
|
| 95 |
assert "step_count" in data
|
| 96 |
assert "flags_found" in data
|
| 97 |
assert "mode" in data
|
| 98 |
assert "services_status" in data
|
| 99 |
|
| 100 |
-
def test_episode_step_count_updates(self, client: TestClient):
|
| 101 |
-
|
|
|
|
|
|
|
| 102 |
data = client.get("/console/api/episode").json()
|
| 103 |
assert data["step_count"] == 0
|
| 104 |
|
| 105 |
-
|
| 106 |
data = client.get("/console/api/episode").json()
|
| 107 |
assert data["step_count"] == 1
|
| 108 |
|
|
@@ -120,15 +136,14 @@ class TestHistoryAPI:
|
|
| 120 |
assert isinstance(data, list)
|
| 121 |
|
| 122 |
def test_history_empty_initially(self, client: TestClient):
|
| 123 |
-
# Reset clears history
|
| 124 |
-
client.post("/reset", json={})
|
| 125 |
data = client.get("/console/api/history").json()
|
| 126 |
assert data == []
|
| 127 |
|
| 128 |
def test_history_records_actions(self, client: TestClient):
|
| 129 |
-
|
| 130 |
-
|
| 131 |
-
|
|
|
|
| 132 |
data = client.get("/console/api/history").json()
|
| 133 |
assert len(data) == 2
|
| 134 |
# Newest first
|
|
@@ -136,8 +151,9 @@ class TestHistoryAPI:
|
|
| 136 |
assert data[1]["mode"] == "red"
|
| 137 |
|
| 138 |
def test_history_has_timestamps(self, client: TestClient):
|
| 139 |
-
|
| 140 |
-
|
|
|
|
| 141 |
data = client.get("/console/api/history").json()
|
| 142 |
assert len(data) == 1
|
| 143 |
assert "time" in data[0]
|
|
@@ -145,11 +161,9 @@ class TestHistoryAPI:
|
|
| 145 |
|
| 146 |
def test_history_max_20(self, client: TestClient):
|
| 147 |
"""History API should return at most 20 entries."""
|
| 148 |
-
|
|
|
|
| 149 |
for i in range(25):
|
| 150 |
-
|
| 151 |
-
"/step",
|
| 152 |
-
json={"command": f"cmd_{i}", "mode": "red"},
|
| 153 |
-
)
|
| 154 |
data = client.get("/console/api/history").json()
|
| 155 |
assert len(data) == 20
|
|
|
|
| 1 |
"""Tests for the operator debugging console (issue #28).
|
| 2 |
|
| 3 |
+
Uses Starlette's TestClient against the OpenEnv app with console router.
|
| 4 |
No Docker dependency.
|
| 5 |
+
|
| 6 |
+
Note: OpenEnv HTTP endpoints are stateless (each creates a new env instance).
|
| 7 |
+
Console API uses a fallback env stored on app.state. History is recorded
|
| 8 |
+
via the module-level record_action() / clear_history() helpers.
|
| 9 |
"""
|
| 10 |
|
| 11 |
from __future__ import annotations
|
|
|
|
| 14 |
from starlette.testclient import TestClient
|
| 15 |
|
| 16 |
from open_range.server.app import create_app
|
| 17 |
+
from open_range.server.console import clear_history, record_action
|
| 18 |
+
from open_range.server.environment import RangeEnvironment
|
| 19 |
|
| 20 |
|
| 21 |
@pytest.fixture()
|
| 22 |
+
def client():
|
| 23 |
+
"""Create a TestClient with a shared env on app.state for console API."""
|
|
|
|
|
|
|
| 24 |
app = create_app()
|
| 25 |
+
# Store a shared env so console API endpoints can access state
|
| 26 |
+
env = RangeEnvironment(docker_available=False)
|
| 27 |
+
app.state.env = env
|
| 28 |
+
clear_history()
|
| 29 |
return TestClient(app)
|
| 30 |
|
| 31 |
|
| 32 |
+
@pytest.fixture()
|
| 33 |
+
def env(client: TestClient) -> RangeEnvironment:
|
| 34 |
+
"""Return the shared env stored on app.state."""
|
| 35 |
+
return client.app.state.env
|
| 36 |
+
|
| 37 |
+
|
| 38 |
# ===================================================================
|
| 39 |
# GET /console -- HTML page
|
| 40 |
# ===================================================================
|
|
|
|
| 73 |
data = client.get("/console/api/snapshot").json()
|
| 74 |
assert data["id"] is None
|
| 75 |
|
| 76 |
+
def test_snapshot_after_reset(self, client: TestClient, env: RangeEnvironment):
|
| 77 |
+
env.reset(episode_id="snap_test_1")
|
| 78 |
data = client.get("/console/api/snapshot").json()
|
| 79 |
assert data["id"] == "snap_test_1"
|
| 80 |
assert "hosts" in data
|
|
|
|
| 82 |
assert "vuln_count" in data
|
| 83 |
assert "tier" in data
|
| 84 |
|
| 85 |
+
def test_snapshot_no_truth_graph_or_flags(self, client: TestClient, env: RangeEnvironment):
|
| 86 |
"""Snapshot API must NOT leak truth_graph or flag values."""
|
| 87 |
+
env.reset()
|
| 88 |
data = client.get("/console/api/snapshot").json()
|
| 89 |
assert "truth_graph" not in data
|
| 90 |
assert "flags" not in data
|
|
|
|
| 103 |
data = resp.json()
|
| 104 |
assert isinstance(data, dict)
|
| 105 |
|
| 106 |
+
def test_episode_fields(self, client: TestClient, env: RangeEnvironment):
|
| 107 |
+
env.reset()
|
| 108 |
data = client.get("/console/api/episode").json()
|
| 109 |
assert "step_count" in data
|
| 110 |
assert "flags_found" in data
|
| 111 |
assert "mode" in data
|
| 112 |
assert "services_status" in data
|
| 113 |
|
| 114 |
+
def test_episode_step_count_updates(self, client: TestClient, env: RangeEnvironment):
|
| 115 |
+
from open_range.server.models import RangeAction
|
| 116 |
+
|
| 117 |
+
env.reset()
|
| 118 |
data = client.get("/console/api/episode").json()
|
| 119 |
assert data["step_count"] == 0
|
| 120 |
|
| 121 |
+
env.step(RangeAction(command="nmap web", mode="red"))
|
| 122 |
data = client.get("/console/api/episode").json()
|
| 123 |
assert data["step_count"] == 1
|
| 124 |
|
|
|
|
| 136 |
assert isinstance(data, list)
|
| 137 |
|
| 138 |
def test_history_empty_initially(self, client: TestClient):
|
|
|
|
|
|
|
| 139 |
data = client.get("/console/api/history").json()
|
| 140 |
assert data == []
|
| 141 |
|
| 142 |
def test_history_records_actions(self, client: TestClient):
|
| 143 |
+
import time
|
| 144 |
+
|
| 145 |
+
record_action({"step": 1, "command": "nmap -sV web", "mode": "red", "time": time.time()})
|
| 146 |
+
record_action({"step": 2, "command": "tail -f /var/log/syslog", "mode": "blue", "time": time.time()})
|
| 147 |
data = client.get("/console/api/history").json()
|
| 148 |
assert len(data) == 2
|
| 149 |
# Newest first
|
|
|
|
| 151 |
assert data[1]["mode"] == "red"
|
| 152 |
|
| 153 |
def test_history_has_timestamps(self, client: TestClient):
|
| 154 |
+
import time
|
| 155 |
+
|
| 156 |
+
record_action({"step": 1, "command": "nmap web", "mode": "red", "time": time.time()})
|
| 157 |
data = client.get("/console/api/history").json()
|
| 158 |
assert len(data) == 1
|
| 159 |
assert "time" in data[0]
|
|
|
|
| 161 |
|
| 162 |
def test_history_max_20(self, client: TestClient):
|
| 163 |
"""History API should return at most 20 entries."""
|
| 164 |
+
import time
|
| 165 |
+
|
| 166 |
for i in range(25):
|
| 167 |
+
record_action({"step": i, "command": f"cmd_{i}", "mode": "red", "time": time.time()})
|
|
|
|
|
|
|
|
|
|
| 168 |
data = client.get("/console/api/history").json()
|
| 169 |
assert len(data) == 20
|
tests/test_parse_llm_response.py
ADDED
|
@@ -0,0 +1,1075 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Tests for _parse_llm_response() — the critical LLM JSON -> SnapshotSpec mapper.
|
| 2 |
+
|
| 3 |
+
Covers field name aliases, evidence spec formats, NPC persona parsing,
|
| 4 |
+
files dict extraction, missing/minimal/malformed input, and a real LLM
|
| 5 |
+
output fixture from snapshots/llm_tier1_test.json.
|
| 6 |
+
"""
|
| 7 |
+
|
| 8 |
+
import json
|
| 9 |
+
from pathlib import Path
|
| 10 |
+
|
| 11 |
+
import pytest
|
| 12 |
+
|
| 13 |
+
from open_range.builder.builder import _parse_llm_response
|
| 14 |
+
from open_range.protocols import (
|
| 15 |
+
EvidenceItem,
|
| 16 |
+
ExploitStep,
|
| 17 |
+
FlagSpec,
|
| 18 |
+
GoldenPathStep,
|
| 19 |
+
NPCPersona,
|
| 20 |
+
SnapshotSpec,
|
| 21 |
+
Vulnerability,
|
| 22 |
+
)
|
| 23 |
+
|
| 24 |
+
ROOT = Path(__file__).parent.parent
|
| 25 |
+
|
| 26 |
+
|
| 27 |
+
# ---------------------------------------------------------------------------
|
| 28 |
+
# Helpers
|
| 29 |
+
# ---------------------------------------------------------------------------
|
| 30 |
+
|
| 31 |
+
|
| 32 |
+
def _minimal_json(**overrides) -> str:
|
| 33 |
+
"""Return a minimal valid JSON string for _parse_llm_response.
|
| 34 |
+
|
| 35 |
+
All top-level keys present but with empty/default values unless overridden.
|
| 36 |
+
"""
|
| 37 |
+
base: dict = {
|
| 38 |
+
"topology": {},
|
| 39 |
+
"truth_graph": {"vulns": [], "exploit_chain": []},
|
| 40 |
+
"golden_path": [],
|
| 41 |
+
"flags": [],
|
| 42 |
+
"evidence_spec": {},
|
| 43 |
+
"npc_personas": [],
|
| 44 |
+
"npc_traffic": {},
|
| 45 |
+
"task": {},
|
| 46 |
+
}
|
| 47 |
+
base.update(overrides)
|
| 48 |
+
return json.dumps(base)
|
| 49 |
+
|
| 50 |
+
|
| 51 |
+
# ---------------------------------------------------------------------------
|
| 52 |
+
# 1. Happy path with real LLM output
|
| 53 |
+
# ---------------------------------------------------------------------------
|
| 54 |
+
|
| 55 |
+
|
| 56 |
+
class TestRealLLMOutput:
|
| 57 |
+
"""Parse the actual LLM-generated JSON from snapshots/llm_tier1_test.json."""
|
| 58 |
+
|
| 59 |
+
@pytest.fixture
|
| 60 |
+
def llm_json(self):
|
| 61 |
+
path = ROOT / "snapshots" / "llm_tier1_test.json"
|
| 62 |
+
return path.read_text()
|
| 63 |
+
|
| 64 |
+
def test_parses_to_snapshot_spec(self, llm_json):
|
| 65 |
+
spec = _parse_llm_response(llm_json)
|
| 66 |
+
assert isinstance(spec, SnapshotSpec)
|
| 67 |
+
|
| 68 |
+
def test_topology_hosts(self, llm_json):
|
| 69 |
+
spec = _parse_llm_response(llm_json)
|
| 70 |
+
hosts = spec.topology["hosts"]
|
| 71 |
+
assert "attacker" in hosts
|
| 72 |
+
assert "web" in hosts
|
| 73 |
+
assert "siem" in hosts
|
| 74 |
+
assert len(hosts) == 8
|
| 75 |
+
|
| 76 |
+
def test_topology_zones(self, llm_json):
|
| 77 |
+
spec = _parse_llm_response(llm_json)
|
| 78 |
+
zones = spec.topology["zones"]
|
| 79 |
+
assert "dmz" in zones
|
| 80 |
+
assert "web" in zones["dmz"]
|
| 81 |
+
|
| 82 |
+
def test_truth_graph_vulns(self, llm_json):
|
| 83 |
+
spec = _parse_llm_response(llm_json)
|
| 84 |
+
assert len(spec.truth_graph.vulns) == 2
|
| 85 |
+
vuln_ids = {v.id for v in spec.truth_graph.vulns}
|
| 86 |
+
assert "VULN-SQLI-LOOKUP" in vuln_ids
|
| 87 |
+
assert "VULN-MISSING-AUTHZ-REPORT" in vuln_ids
|
| 88 |
+
|
| 89 |
+
def test_truth_graph_exploit_chain(self, llm_json):
|
| 90 |
+
spec = _parse_llm_response(llm_json)
|
| 91 |
+
assert len(spec.truth_graph.exploit_chain) == 2
|
| 92 |
+
assert spec.truth_graph.exploit_chain[0].vuln_id == "VULN-SQLI-LOOKUP"
|
| 93 |
+
|
| 94 |
+
def test_flags(self, llm_json):
|
| 95 |
+
spec = _parse_llm_response(llm_json)
|
| 96 |
+
assert len(spec.flags) == 2
|
| 97 |
+
values = {f.value for f in spec.flags}
|
| 98 |
+
assert "FLAG{9f3a2b4c5d6e7f80}" in values
|
| 99 |
+
assert "FLAG{1a2b3c4d5e6f7788}" in values
|
| 100 |
+
|
| 101 |
+
def test_golden_path(self, llm_json):
|
| 102 |
+
spec = _parse_llm_response(llm_json)
|
| 103 |
+
assert len(spec.golden_path) == 8
|
| 104 |
+
# The real LLM output uses "cmd" field name
|
| 105 |
+
assert spec.golden_path[0].command == "nmap -p 80 10.0.1.10"
|
| 106 |
+
assert spec.golden_path[0].expect_in_stdout == "80/tcp open"
|
| 107 |
+
|
| 108 |
+
def test_task_briefings(self, llm_json):
|
| 109 |
+
spec = _parse_llm_response(llm_json)
|
| 110 |
+
assert "Meridian" in spec.task.red_briefing
|
| 111 |
+
assert spec.task.blue_briefing != ""
|
| 112 |
+
|
| 113 |
+
def test_npc_personas(self, llm_json):
|
| 114 |
+
spec = _parse_llm_response(llm_json)
|
| 115 |
+
assert len(spec.npc_personas) == 8
|
| 116 |
+
names = {p.name for p in spec.npc_personas}
|
| 117 |
+
assert "Derek Thompson" in names
|
| 118 |
+
assert "Karen Williams" in names
|
| 119 |
+
|
| 120 |
+
def test_npc_persona_security_awareness(self, llm_json):
|
| 121 |
+
spec = _parse_llm_response(llm_json)
|
| 122 |
+
by_name = {p.name: p for p in spec.npc_personas}
|
| 123 |
+
assert by_name["Derek Thompson"].security_awareness == 0.85
|
| 124 |
+
assert by_name["Karen Williams"].security_awareness == 0.25
|
| 125 |
+
|
| 126 |
+
def test_files_dict(self, llm_json):
|
| 127 |
+
spec = _parse_llm_response(llm_json)
|
| 128 |
+
# Real LLM output has explicit files + vulnerable_code dicts
|
| 129 |
+
assert len(spec.files) > 0
|
| 130 |
+
assert "web:/var/www/portal/lookup.php" in spec.files
|
| 131 |
+
assert "web:/var/www/portal/admin/compliance_report.php" in spec.files
|
| 132 |
+
|
| 133 |
+
def test_vulnerable_code_as_dict_extracted_to_files(self, llm_json):
|
| 134 |
+
spec = _parse_llm_response(llm_json)
|
| 135 |
+
# The VULN-SQLI-LOOKUP has vulnerable_code as dict with key
|
| 136 |
+
# /var/www/portal/lookup.php. It should be extracted to files
|
| 137 |
+
# as "web:/var/www/portal/lookup.php".
|
| 138 |
+
# But the explicit files dict already has this key, so the
|
| 139 |
+
# explicit one takes precedence (container_key not in files check).
|
| 140 |
+
assert "web:/var/www/portal/lookup.php" in spec.files
|
| 141 |
+
|
| 142 |
+
|
| 143 |
+
# ---------------------------------------------------------------------------
|
| 144 |
+
# 2. Field name mappings (ExploitStep aliases)
|
| 145 |
+
# ---------------------------------------------------------------------------
|
| 146 |
+
|
| 147 |
+
|
| 148 |
+
class TestExploitStepFieldMappings:
|
| 149 |
+
"""LLM uses vuln/action/yields; Pydantic expects vuln_id/command/description."""
|
| 150 |
+
|
| 151 |
+
def test_vuln_maps_to_vuln_id(self):
|
| 152 |
+
raw = _minimal_json(
|
| 153 |
+
truth_graph={
|
| 154 |
+
"vulns": [],
|
| 155 |
+
"exploit_chain": [
|
| 156 |
+
{"vuln": "V1", "action": "run exploit", "yields": "root shell"}
|
| 157 |
+
],
|
| 158 |
+
}
|
| 159 |
+
)
|
| 160 |
+
spec = _parse_llm_response(raw)
|
| 161 |
+
assert spec.truth_graph.exploit_chain[0].vuln_id == "V1"
|
| 162 |
+
|
| 163 |
+
def test_action_maps_to_command(self):
|
| 164 |
+
raw = _minimal_json(
|
| 165 |
+
truth_graph={
|
| 166 |
+
"vulns": [],
|
| 167 |
+
"exploit_chain": [
|
| 168 |
+
{"vuln": "V1", "action": "sqlmap -u http://...", "yields": "db dump"}
|
| 169 |
+
],
|
| 170 |
+
}
|
| 171 |
+
)
|
| 172 |
+
spec = _parse_llm_response(raw)
|
| 173 |
+
assert spec.truth_graph.exploit_chain[0].command == "sqlmap -u http://..."
|
| 174 |
+
|
| 175 |
+
def test_yields_maps_to_description(self):
|
| 176 |
+
raw = _minimal_json(
|
| 177 |
+
truth_graph={
|
| 178 |
+
"vulns": [],
|
| 179 |
+
"exploit_chain": [
|
| 180 |
+
{"vuln": "V1", "action": "cmd", "yields": "got credentials"}
|
| 181 |
+
],
|
| 182 |
+
}
|
| 183 |
+
)
|
| 184 |
+
spec = _parse_llm_response(raw)
|
| 185 |
+
assert spec.truth_graph.exploit_chain[0].description == "got credentials"
|
| 186 |
+
|
| 187 |
+
def test_canonical_names_also_work(self):
|
| 188 |
+
"""vuln_id/command/description should pass through without aliasing."""
|
| 189 |
+
raw = _minimal_json(
|
| 190 |
+
truth_graph={
|
| 191 |
+
"vulns": [],
|
| 192 |
+
"exploit_chain": [
|
| 193 |
+
{
|
| 194 |
+
"vuln_id": "V2",
|
| 195 |
+
"command": "nmap -sV ...",
|
| 196 |
+
"description": "port scan",
|
| 197 |
+
}
|
| 198 |
+
],
|
| 199 |
+
}
|
| 200 |
+
)
|
| 201 |
+
spec = _parse_llm_response(raw)
|
| 202 |
+
ec = spec.truth_graph.exploit_chain[0]
|
| 203 |
+
assert ec.vuln_id == "V2"
|
| 204 |
+
assert ec.command == "nmap -sV ..."
|
| 205 |
+
assert ec.description == "port scan"
|
| 206 |
+
|
| 207 |
+
def test_canonical_names_take_precedence(self):
|
| 208 |
+
"""When both canonical and alias are present, canonical wins (via get order)."""
|
| 209 |
+
raw = _minimal_json(
|
| 210 |
+
truth_graph={
|
| 211 |
+
"vulns": [],
|
| 212 |
+
"exploit_chain": [
|
| 213 |
+
{
|
| 214 |
+
"vuln_id": "canonical",
|
| 215 |
+
"vuln": "alias",
|
| 216 |
+
"command": "canonical_cmd",
|
| 217 |
+
"action": "alias_cmd",
|
| 218 |
+
"description": "canonical_desc",
|
| 219 |
+
"yields": "alias_desc",
|
| 220 |
+
}
|
| 221 |
+
],
|
| 222 |
+
}
|
| 223 |
+
)
|
| 224 |
+
spec = _parse_llm_response(raw)
|
| 225 |
+
ec = spec.truth_graph.exploit_chain[0]
|
| 226 |
+
assert ec.vuln_id == "canonical"
|
| 227 |
+
assert ec.command == "canonical_cmd"
|
| 228 |
+
assert ec.description == "canonical_desc"
|
| 229 |
+
|
| 230 |
+
|
| 231 |
+
# ---------------------------------------------------------------------------
|
| 232 |
+
# 3. GoldenPathStep field mappings
|
| 233 |
+
# ---------------------------------------------------------------------------
|
| 234 |
+
|
| 235 |
+
|
| 236 |
+
class TestGoldenPathFieldMappings:
|
| 237 |
+
"""LLM uses cmd/expect_stdout; Pydantic expects command/expect_in_stdout."""
|
| 238 |
+
|
| 239 |
+
def test_cmd_maps_to_command(self):
|
| 240 |
+
raw = _minimal_json(
|
| 241 |
+
golden_path=[
|
| 242 |
+
{"step": 1, "cmd": "nmap -sV 10.0.1.0/24", "expect_stdout": "open"}
|
| 243 |
+
]
|
| 244 |
+
)
|
| 245 |
+
spec = _parse_llm_response(raw)
|
| 246 |
+
assert spec.golden_path[0].command == "nmap -sV 10.0.1.0/24"
|
| 247 |
+
|
| 248 |
+
def test_expect_stdout_maps_to_expect_in_stdout(self):
|
| 249 |
+
raw = _minimal_json(
|
| 250 |
+
golden_path=[
|
| 251 |
+
{"step": 1, "cmd": "whoami", "expect_stdout": "root"}
|
| 252 |
+
]
|
| 253 |
+
)
|
| 254 |
+
spec = _parse_llm_response(raw)
|
| 255 |
+
assert spec.golden_path[0].expect_in_stdout == "root"
|
| 256 |
+
|
| 257 |
+
def test_canonical_command_field(self):
|
| 258 |
+
raw = _minimal_json(
|
| 259 |
+
golden_path=[
|
| 260 |
+
{"step": 1, "command": "ls -la", "expect_in_stdout": "total"}
|
| 261 |
+
]
|
| 262 |
+
)
|
| 263 |
+
spec = _parse_llm_response(raw)
|
| 264 |
+
assert spec.golden_path[0].command == "ls -la"
|
| 265 |
+
assert spec.golden_path[0].expect_in_stdout == "total"
|
| 266 |
+
|
| 267 |
+
def test_mixed_field_names_across_steps(self):
|
| 268 |
+
"""Some steps use cmd, others use command — both should parse."""
|
| 269 |
+
raw = _minimal_json(
|
| 270 |
+
golden_path=[
|
| 271 |
+
{"step": 1, "cmd": "nmap scan", "expect_stdout": "80/tcp"},
|
| 272 |
+
{"step": 2, "command": "curl http://web", "expect_in_stdout": "Welcome"},
|
| 273 |
+
{"step": 3, "cmd": "sqlmap", "expect_in_stdout": "FLAG"},
|
| 274 |
+
]
|
| 275 |
+
)
|
| 276 |
+
spec = _parse_llm_response(raw)
|
| 277 |
+
assert len(spec.golden_path) == 3
|
| 278 |
+
assert spec.golden_path[0].command == "nmap scan"
|
| 279 |
+
assert spec.golden_path[0].expect_in_stdout == "80/tcp"
|
| 280 |
+
assert spec.golden_path[1].command == "curl http://web"
|
| 281 |
+
assert spec.golden_path[1].expect_in_stdout == "Welcome"
|
| 282 |
+
assert spec.golden_path[2].command == "sqlmap"
|
| 283 |
+
assert spec.golden_path[2].expect_in_stdout == "FLAG"
|
| 284 |
+
|
| 285 |
+
def test_step_number_preserved(self):
|
| 286 |
+
raw = _minimal_json(
|
| 287 |
+
golden_path=[
|
| 288 |
+
{"step": 5, "cmd": "echo hi", "expect_stdout": "hi"}
|
| 289 |
+
]
|
| 290 |
+
)
|
| 291 |
+
spec = _parse_llm_response(raw)
|
| 292 |
+
assert spec.golden_path[0].step == 5
|
| 293 |
+
|
| 294 |
+
def test_description_field_preserved(self):
|
| 295 |
+
raw = _minimal_json(
|
| 296 |
+
golden_path=[
|
| 297 |
+
{
|
| 298 |
+
"step": 1,
|
| 299 |
+
"cmd": "nmap",
|
| 300 |
+
"expect_stdout": "open",
|
| 301 |
+
"description": "Port scan the DMZ",
|
| 302 |
+
}
|
| 303 |
+
]
|
| 304 |
+
)
|
| 305 |
+
spec = _parse_llm_response(raw)
|
| 306 |
+
assert spec.golden_path[0].description == "Port scan the DMZ"
|
| 307 |
+
|
| 308 |
+
def test_cmd_takes_precedence_over_command(self):
|
| 309 |
+
"""When both cmd and command are present, cmd wins (it's checked first)."""
|
| 310 |
+
raw = _minimal_json(
|
| 311 |
+
golden_path=[
|
| 312 |
+
{
|
| 313 |
+
"step": 1,
|
| 314 |
+
"cmd": "cmd_value",
|
| 315 |
+
"command": "command_value",
|
| 316 |
+
"expect_stdout": "x",
|
| 317 |
+
}
|
| 318 |
+
]
|
| 319 |
+
)
|
| 320 |
+
spec = _parse_llm_response(raw)
|
| 321 |
+
assert spec.golden_path[0].command == "cmd_value"
|
| 322 |
+
|
| 323 |
+
|
| 324 |
+
# ---------------------------------------------------------------------------
|
| 325 |
+
# 4. Evidence spec parsing
|
| 326 |
+
# ---------------------------------------------------------------------------
|
| 327 |
+
|
| 328 |
+
|
| 329 |
+
class TestEvidenceSpecParsing:
|
| 330 |
+
"""LLM returns dict, protocol expects list[EvidenceItem]."""
|
| 331 |
+
|
| 332 |
+
def test_dict_with_string_values(self):
|
| 333 |
+
raw = _minimal_json(
|
| 334 |
+
evidence_spec={
|
| 335 |
+
"web_access_log": "SQL injection pattern",
|
| 336 |
+
"siem_alerts": "Unauthorized access",
|
| 337 |
+
}
|
| 338 |
+
)
|
| 339 |
+
spec = _parse_llm_response(raw)
|
| 340 |
+
assert len(spec.evidence_spec) == 2
|
| 341 |
+
locations = {e.location for e in spec.evidence_spec}
|
| 342 |
+
assert "web_access_log" in locations
|
| 343 |
+
assert "siem_alerts" in locations
|
| 344 |
+
# String values become log_entry type
|
| 345 |
+
for e in spec.evidence_spec:
|
| 346 |
+
if e.location == "web_access_log":
|
| 347 |
+
assert e.type == "log_entry"
|
| 348 |
+
assert e.pattern == "SQL injection pattern"
|
| 349 |
+
|
| 350 |
+
def test_dict_with_list_values(self):
|
| 351 |
+
raw = _minimal_json(
|
| 352 |
+
evidence_spec={
|
| 353 |
+
"siem_alerts": ["UNION SELECT detected", "admin endpoint accessed"],
|
| 354 |
+
}
|
| 355 |
+
)
|
| 356 |
+
spec = _parse_llm_response(raw)
|
| 357 |
+
assert len(spec.evidence_spec) == 2
|
| 358 |
+
# List values become alert type
|
| 359 |
+
for e in spec.evidence_spec:
|
| 360 |
+
assert e.type == "alert"
|
| 361 |
+
assert e.location == "siem_alerts"
|
| 362 |
+
patterns = {e.pattern for e in spec.evidence_spec}
|
| 363 |
+
assert "UNION SELECT detected" in patterns
|
| 364 |
+
assert "admin endpoint accessed" in patterns
|
| 365 |
+
|
| 366 |
+
def test_dict_with_mixed_values(self):
|
| 367 |
+
raw = _minimal_json(
|
| 368 |
+
evidence_spec={
|
| 369 |
+
"web_log": "GET /search?q=",
|
| 370 |
+
"alerts": ["sqli_detected", "auth_bypass"],
|
| 371 |
+
}
|
| 372 |
+
)
|
| 373 |
+
spec = _parse_llm_response(raw)
|
| 374 |
+
assert len(spec.evidence_spec) == 3 # 1 string + 2 list items
|
| 375 |
+
|
| 376 |
+
def test_list_format_passthrough(self):
|
| 377 |
+
"""When evidence_spec is already a list of dicts, parse directly."""
|
| 378 |
+
raw = _minimal_json(
|
| 379 |
+
evidence_spec=[
|
| 380 |
+
{"type": "alert", "location": "siem", "pattern": "SQLi"},
|
| 381 |
+
{"type": "log_entry", "location": "web_log", "pattern": "GET /admin"},
|
| 382 |
+
]
|
| 383 |
+
)
|
| 384 |
+
spec = _parse_llm_response(raw)
|
| 385 |
+
assert len(spec.evidence_spec) == 2
|
| 386 |
+
assert spec.evidence_spec[0].type == "alert"
|
| 387 |
+
assert spec.evidence_spec[1].location == "web_log"
|
| 388 |
+
|
| 389 |
+
def test_empty_dict(self):
|
| 390 |
+
raw = _minimal_json(evidence_spec={})
|
| 391 |
+
spec = _parse_llm_response(raw)
|
| 392 |
+
assert spec.evidence_spec == []
|
| 393 |
+
|
| 394 |
+
def test_empty_list(self):
|
| 395 |
+
raw = _minimal_json(evidence_spec=[])
|
| 396 |
+
spec = _parse_llm_response(raw)
|
| 397 |
+
assert spec.evidence_spec == []
|
| 398 |
+
|
| 399 |
+
|
| 400 |
+
# ---------------------------------------------------------------------------
|
| 401 |
+
# 5. NPC persona parsing
|
| 402 |
+
# ---------------------------------------------------------------------------
|
| 403 |
+
|
| 404 |
+
|
| 405 |
+
class TestNPCPersonaParsing:
|
| 406 |
+
def test_basic_persona(self):
|
| 407 |
+
raw = _minimal_json(
|
| 408 |
+
npc_personas=[
|
| 409 |
+
{
|
| 410 |
+
"name": "Alice",
|
| 411 |
+
"role": "Admin",
|
| 412 |
+
"department": "IT",
|
| 413 |
+
"security_awareness": 0.9,
|
| 414 |
+
}
|
| 415 |
+
]
|
| 416 |
+
)
|
| 417 |
+
spec = _parse_llm_response(raw)
|
| 418 |
+
assert len(spec.npc_personas) == 1
|
| 419 |
+
p = spec.npc_personas[0]
|
| 420 |
+
assert p.name == "Alice"
|
| 421 |
+
assert p.role == "Admin"
|
| 422 |
+
assert p.department == "IT"
|
| 423 |
+
assert p.security_awareness == 0.9
|
| 424 |
+
|
| 425 |
+
def test_accounts_with_string_values(self):
|
| 426 |
+
raw = _minimal_json(
|
| 427 |
+
npc_personas=[
|
| 428 |
+
{
|
| 429 |
+
"name": "Bob",
|
| 430 |
+
"accounts": {
|
| 431 |
+
"email": "bob@corp.local",
|
| 432 |
+
"ldap_dn": "cn=bob,dc=corp,dc=local",
|
| 433 |
+
},
|
| 434 |
+
}
|
| 435 |
+
]
|
| 436 |
+
)
|
| 437 |
+
spec = _parse_llm_response(raw)
|
| 438 |
+
assert spec.npc_personas[0].accounts["email"] == "bob@corp.local"
|
| 439 |
+
|
| 440 |
+
def test_default_security_awareness(self):
|
| 441 |
+
"""Missing security_awareness defaults to 0.5."""
|
| 442 |
+
raw = _minimal_json(npc_personas=[{"name": "Charlie"}])
|
| 443 |
+
spec = _parse_llm_response(raw)
|
| 444 |
+
assert spec.npc_personas[0].security_awareness == 0.5
|
| 445 |
+
|
| 446 |
+
def test_susceptibility_dict(self):
|
| 447 |
+
raw = _minimal_json(
|
| 448 |
+
npc_personas=[
|
| 449 |
+
{
|
| 450 |
+
"name": "Diana",
|
| 451 |
+
"susceptibility": {"phishing": 0.8, "pretexting": 0.6},
|
| 452 |
+
}
|
| 453 |
+
]
|
| 454 |
+
)
|
| 455 |
+
spec = _parse_llm_response(raw)
|
| 456 |
+
assert spec.npc_personas[0].susceptibility["phishing"] == 0.8
|
| 457 |
+
|
| 458 |
+
def test_routine_dict(self):
|
| 459 |
+
raw = _minimal_json(
|
| 460 |
+
npc_personas=[
|
| 461 |
+
{
|
| 462 |
+
"name": "Eve",
|
| 463 |
+
"routine": {
|
| 464 |
+
"morning": "check email",
|
| 465 |
+
"afternoon": "process reports",
|
| 466 |
+
},
|
| 467 |
+
}
|
| 468 |
+
]
|
| 469 |
+
)
|
| 470 |
+
spec = _parse_llm_response(raw)
|
| 471 |
+
assert spec.npc_personas[0].routine["morning"] == "check email"
|
| 472 |
+
|
| 473 |
+
def test_multiple_personas(self):
|
| 474 |
+
raw = _minimal_json(
|
| 475 |
+
npc_personas=[
|
| 476 |
+
{"name": "P1", "security_awareness": 0.1},
|
| 477 |
+
{"name": "P2", "security_awareness": 0.5},
|
| 478 |
+
{"name": "P3", "security_awareness": 0.9},
|
| 479 |
+
]
|
| 480 |
+
)
|
| 481 |
+
spec = _parse_llm_response(raw)
|
| 482 |
+
assert len(spec.npc_personas) == 3
|
| 483 |
+
names = [p.name for p in spec.npc_personas]
|
| 484 |
+
assert names == ["P1", "P2", "P3"]
|
| 485 |
+
|
| 486 |
+
def test_missing_optional_fields_default(self):
|
| 487 |
+
"""All optional fields should default gracefully."""
|
| 488 |
+
raw = _minimal_json(npc_personas=[{"name": "Minimal"}])
|
| 489 |
+
spec = _parse_llm_response(raw)
|
| 490 |
+
p = spec.npc_personas[0]
|
| 491 |
+
assert p.name == "Minimal"
|
| 492 |
+
assert p.role == ""
|
| 493 |
+
assert p.department == ""
|
| 494 |
+
assert p.reports_to == ""
|
| 495 |
+
assert p.communication_style == ""
|
| 496 |
+
assert p.susceptibility == {}
|
| 497 |
+
assert p.routine == {}
|
| 498 |
+
assert p.accounts == {}
|
| 499 |
+
|
| 500 |
+
|
| 501 |
+
# ---------------------------------------------------------------------------
|
| 502 |
+
# 6. Files dict extraction
|
| 503 |
+
# ---------------------------------------------------------------------------
|
| 504 |
+
|
| 505 |
+
|
| 506 |
+
class TestFilesDictExtraction:
|
| 507 |
+
def test_explicit_files_field(self):
|
| 508 |
+
raw = _minimal_json(
|
| 509 |
+
files={
|
| 510 |
+
"web:/var/www/index.php": "<?php echo 'hello'; ?>",
|
| 511 |
+
"db:/opt/init.sql": "CREATE TABLE t(id INT);",
|
| 512 |
+
}
|
| 513 |
+
)
|
| 514 |
+
spec = _parse_llm_response(raw)
|
| 515 |
+
assert len(spec.files) == 2
|
| 516 |
+
assert spec.files["web:/var/www/index.php"] == "<?php echo 'hello'; ?>"
|
| 517 |
+
|
| 518 |
+
def test_vulnerable_code_dict_extracted(self):
|
| 519 |
+
"""vulnerable_code as {file_path: code} should be extracted to files."""
|
| 520 |
+
raw = _minimal_json(
|
| 521 |
+
truth_graph={
|
| 522 |
+
"vulns": [
|
| 523 |
+
{
|
| 524 |
+
"id": "v1",
|
| 525 |
+
"type": "sqli",
|
| 526 |
+
"host": "web",
|
| 527 |
+
"service": "php",
|
| 528 |
+
"injection_point": "/search",
|
| 529 |
+
"vulnerable_code": {
|
| 530 |
+
"/var/www/search.php": "<?php $q=$_GET['q']; ?>"
|
| 531 |
+
},
|
| 532 |
+
}
|
| 533 |
+
],
|
| 534 |
+
"exploit_chain": [],
|
| 535 |
+
}
|
| 536 |
+
)
|
| 537 |
+
spec = _parse_llm_response(raw)
|
| 538 |
+
assert "web:/var/www/search.php" in spec.files
|
| 539 |
+
assert spec.files["web:/var/www/search.php"] == "<?php $q=$_GET['q']; ?>"
|
| 540 |
+
|
| 541 |
+
def test_vulnerable_code_string_on_web_host(self):
|
| 542 |
+
"""String vulnerable_code on web host with / injection_point goes to web:/var/www/portal{ip}."""
|
| 543 |
+
raw = _minimal_json(
|
| 544 |
+
truth_graph={
|
| 545 |
+
"vulns": [
|
| 546 |
+
{
|
| 547 |
+
"id": "v1",
|
| 548 |
+
"type": "sqli",
|
| 549 |
+
"host": "web",
|
| 550 |
+
"service": "php",
|
| 551 |
+
"injection_point": "/search.php",
|
| 552 |
+
"vulnerable_code": "<?php echo 'vuln'; ?>",
|
| 553 |
+
}
|
| 554 |
+
],
|
| 555 |
+
"exploit_chain": [],
|
| 556 |
+
}
|
| 557 |
+
)
|
| 558 |
+
spec = _parse_llm_response(raw)
|
| 559 |
+
assert "web:/var/www/portal/search.php" in spec.files
|
| 560 |
+
|
| 561 |
+
def test_vulnerable_code_string_non_web_host_skipped(self):
|
| 562 |
+
"""String vulnerable_code on non-web host without / prefix is not extracted."""
|
| 563 |
+
raw = _minimal_json(
|
| 564 |
+
truth_graph={
|
| 565 |
+
"vulns": [
|
| 566 |
+
{
|
| 567 |
+
"id": "v1",
|
| 568 |
+
"type": "weak_creds",
|
| 569 |
+
"host": "db",
|
| 570 |
+
"service": "mysql",
|
| 571 |
+
"injection_point": "mysql -u root -proot",
|
| 572 |
+
"vulnerable_code": "",
|
| 573 |
+
}
|
| 574 |
+
],
|
| 575 |
+
"exploit_chain": [],
|
| 576 |
+
}
|
| 577 |
+
)
|
| 578 |
+
spec = _parse_llm_response(raw)
|
| 579 |
+
assert len(spec.files) == 0
|
| 580 |
+
|
| 581 |
+
def test_explicit_files_not_overwritten_by_vulnerable_code(self):
|
| 582 |
+
"""If explicit files has a key, vulnerable_code should not overwrite it."""
|
| 583 |
+
raw = _minimal_json(
|
| 584 |
+
files={"web:/var/www/search.php": "explicit content"},
|
| 585 |
+
truth_graph={
|
| 586 |
+
"vulns": [
|
| 587 |
+
{
|
| 588 |
+
"id": "v1",
|
| 589 |
+
"type": "sqli",
|
| 590 |
+
"host": "web",
|
| 591 |
+
"service": "php",
|
| 592 |
+
"injection_point": "/search",
|
| 593 |
+
"vulnerable_code": {
|
| 594 |
+
"/var/www/search.php": "vulnerable content"
|
| 595 |
+
},
|
| 596 |
+
}
|
| 597 |
+
],
|
| 598 |
+
"exploit_chain": [],
|
| 599 |
+
},
|
| 600 |
+
)
|
| 601 |
+
spec = _parse_llm_response(raw)
|
| 602 |
+
assert spec.files["web:/var/www/search.php"] == "explicit content"
|
| 603 |
+
|
| 604 |
+
def test_no_files_field_produces_empty_dict(self):
|
| 605 |
+
raw = _minimal_json()
|
| 606 |
+
spec = _parse_llm_response(raw)
|
| 607 |
+
assert spec.files == {}
|
| 608 |
+
|
| 609 |
+
def test_files_field_non_string_values_skipped(self):
|
| 610 |
+
"""Non-string values in files dict are silently skipped."""
|
| 611 |
+
raw = _minimal_json(
|
| 612 |
+
files={
|
| 613 |
+
"web:/good.php": "<?php ?>",
|
| 614 |
+
"web:/bad.php": 12345,
|
| 615 |
+
"web:/also_bad.php": ["not", "a", "string"],
|
| 616 |
+
}
|
| 617 |
+
)
|
| 618 |
+
spec = _parse_llm_response(raw)
|
| 619 |
+
assert len(spec.files) == 1
|
| 620 |
+
assert "web:/good.php" in spec.files
|
| 621 |
+
|
| 622 |
+
|
| 623 |
+
# ---------------------------------------------------------------------------
|
| 624 |
+
# 7. Missing optional fields
|
| 625 |
+
# ---------------------------------------------------------------------------
|
| 626 |
+
|
| 627 |
+
|
| 628 |
+
class TestMissingOptionalFields:
|
| 629 |
+
def test_missing_evidence_spec(self):
|
| 630 |
+
data = {
|
| 631 |
+
"topology": {},
|
| 632 |
+
"truth_graph": {"vulns": [], "exploit_chain": []},
|
| 633 |
+
"golden_path": [],
|
| 634 |
+
"flags": [],
|
| 635 |
+
"npc_personas": [],
|
| 636 |
+
"npc_traffic": {},
|
| 637 |
+
"task": {},
|
| 638 |
+
}
|
| 639 |
+
spec = _parse_llm_response(json.dumps(data))
|
| 640 |
+
assert spec.evidence_spec == []
|
| 641 |
+
|
| 642 |
+
def test_missing_npc_personas(self):
|
| 643 |
+
data = {
|
| 644 |
+
"topology": {},
|
| 645 |
+
"truth_graph": {"vulns": [], "exploit_chain": []},
|
| 646 |
+
"golden_path": [],
|
| 647 |
+
"flags": [],
|
| 648 |
+
"evidence_spec": {},
|
| 649 |
+
"npc_traffic": {},
|
| 650 |
+
"task": {},
|
| 651 |
+
}
|
| 652 |
+
spec = _parse_llm_response(json.dumps(data))
|
| 653 |
+
assert spec.npc_personas == []
|
| 654 |
+
|
| 655 |
+
def test_missing_npc_traffic(self):
|
| 656 |
+
data = {
|
| 657 |
+
"topology": {},
|
| 658 |
+
"truth_graph": {"vulns": [], "exploit_chain": []},
|
| 659 |
+
"golden_path": [],
|
| 660 |
+
"flags": [],
|
| 661 |
+
"evidence_spec": {},
|
| 662 |
+
"npc_personas": [],
|
| 663 |
+
"task": {},
|
| 664 |
+
}
|
| 665 |
+
spec = _parse_llm_response(json.dumps(data))
|
| 666 |
+
# npc_traffic gets default NPCTrafficSpec values
|
| 667 |
+
assert spec.npc_traffic.level == 0
|
| 668 |
+
|
| 669 |
+
def test_missing_task(self):
|
| 670 |
+
data = {
|
| 671 |
+
"topology": {},
|
| 672 |
+
"truth_graph": {"vulns": [], "exploit_chain": []},
|
| 673 |
+
"golden_path": [],
|
| 674 |
+
"flags": [],
|
| 675 |
+
"evidence_spec": {},
|
| 676 |
+
"npc_personas": [],
|
| 677 |
+
"npc_traffic": {},
|
| 678 |
+
}
|
| 679 |
+
spec = _parse_llm_response(json.dumps(data))
|
| 680 |
+
assert spec.task.red_briefing == ""
|
| 681 |
+
assert spec.task.blue_briefing == ""
|
| 682 |
+
|
| 683 |
+
def test_missing_truth_graph(self):
|
| 684 |
+
data = {
|
| 685 |
+
"topology": {"hosts": ["web"]},
|
| 686 |
+
"golden_path": [],
|
| 687 |
+
"flags": [],
|
| 688 |
+
"evidence_spec": {},
|
| 689 |
+
"npc_personas": [],
|
| 690 |
+
"npc_traffic": {},
|
| 691 |
+
"task": {},
|
| 692 |
+
}
|
| 693 |
+
spec = _parse_llm_response(json.dumps(data))
|
| 694 |
+
assert spec.truth_graph.vulns == []
|
| 695 |
+
assert spec.truth_graph.exploit_chain == []
|
| 696 |
+
|
| 697 |
+
def test_missing_golden_path(self):
|
| 698 |
+
data = {
|
| 699 |
+
"topology": {},
|
| 700 |
+
"truth_graph": {"vulns": [], "exploit_chain": []},
|
| 701 |
+
"flags": [],
|
| 702 |
+
"evidence_spec": {},
|
| 703 |
+
"npc_personas": [],
|
| 704 |
+
"npc_traffic": {},
|
| 705 |
+
"task": {},
|
| 706 |
+
}
|
| 707 |
+
spec = _parse_llm_response(json.dumps(data))
|
| 708 |
+
assert spec.golden_path == []
|
| 709 |
+
|
| 710 |
+
def test_missing_flags(self):
|
| 711 |
+
data = {
|
| 712 |
+
"topology": {},
|
| 713 |
+
"truth_graph": {"vulns": [], "exploit_chain": []},
|
| 714 |
+
"golden_path": [],
|
| 715 |
+
"evidence_spec": {},
|
| 716 |
+
"npc_personas": [],
|
| 717 |
+
"npc_traffic": {},
|
| 718 |
+
"task": {},
|
| 719 |
+
}
|
| 720 |
+
spec = _parse_llm_response(json.dumps(data))
|
| 721 |
+
assert spec.flags == []
|
| 722 |
+
|
| 723 |
+
def test_vuln_with_minimal_fields(self):
|
| 724 |
+
"""A vulnerability with only id, type, host should parse fine."""
|
| 725 |
+
raw = _minimal_json(
|
| 726 |
+
truth_graph={
|
| 727 |
+
"vulns": [{"id": "v1", "type": "sqli", "host": "web"}],
|
| 728 |
+
"exploit_chain": [],
|
| 729 |
+
}
|
| 730 |
+
)
|
| 731 |
+
spec = _parse_llm_response(raw)
|
| 732 |
+
v = spec.truth_graph.vulns[0]
|
| 733 |
+
assert v.id == "v1"
|
| 734 |
+
assert v.service == ""
|
| 735 |
+
assert v.injection_point == ""
|
| 736 |
+
assert v.vulnerable_code == ""
|
| 737 |
+
assert v.root_cause == ""
|
| 738 |
+
|
| 739 |
+
|
| 740 |
+
# ---------------------------------------------------------------------------
|
| 741 |
+
# 8. Empty/minimal input
|
| 742 |
+
# ---------------------------------------------------------------------------
|
| 743 |
+
|
| 744 |
+
|
| 745 |
+
class TestMinimalInput:
|
| 746 |
+
def test_completely_empty_json_object(self):
|
| 747 |
+
"""An empty JSON object should produce a valid SnapshotSpec with defaults."""
|
| 748 |
+
spec = _parse_llm_response("{}")
|
| 749 |
+
assert isinstance(spec, SnapshotSpec)
|
| 750 |
+
assert spec.topology == {}
|
| 751 |
+
assert spec.truth_graph.vulns == []
|
| 752 |
+
assert spec.golden_path == []
|
| 753 |
+
assert spec.flags == []
|
| 754 |
+
assert spec.evidence_spec == []
|
| 755 |
+
assert spec.npc_personas == []
|
| 756 |
+
|
| 757 |
+
def test_minimal_valid_json(self):
|
| 758 |
+
raw = _minimal_json()
|
| 759 |
+
spec = _parse_llm_response(raw)
|
| 760 |
+
assert isinstance(spec, SnapshotSpec)
|
| 761 |
+
|
| 762 |
+
def test_topology_only(self):
|
| 763 |
+
raw = json.dumps({"topology": {"hosts": ["web", "db"]}})
|
| 764 |
+
spec = _parse_llm_response(raw)
|
| 765 |
+
assert spec.topology["hosts"] == ["web", "db"]
|
| 766 |
+
assert spec.golden_path == []
|
| 767 |
+
|
| 768 |
+
|
| 769 |
+
# ---------------------------------------------------------------------------
|
| 770 |
+
# 9. Malformed input
|
| 771 |
+
# ---------------------------------------------------------------------------
|
| 772 |
+
|
| 773 |
+
|
| 774 |
+
class TestMalformedInput:
|
| 775 |
+
def test_invalid_json_raises(self):
|
| 776 |
+
with pytest.raises(json.JSONDecodeError):
|
| 777 |
+
_parse_llm_response("not valid json {{{")
|
| 778 |
+
|
| 779 |
+
def test_json_array_not_object_raises(self):
|
| 780 |
+
"""Top-level must be an object, not an array."""
|
| 781 |
+
with pytest.raises((TypeError, AttributeError)):
|
| 782 |
+
_parse_llm_response("[1, 2, 3]")
|
| 783 |
+
|
| 784 |
+
def test_json_string_not_object_raises(self):
|
| 785 |
+
with pytest.raises((TypeError, AttributeError)):
|
| 786 |
+
_parse_llm_response('"just a string"')
|
| 787 |
+
|
| 788 |
+
def test_truth_graph_not_dict_handled(self):
|
| 789 |
+
"""If truth_graph is a non-dict, .get() calls should fail gracefully."""
|
| 790 |
+
# truth_graph as string
|
| 791 |
+
raw = json.dumps({"truth_graph": "not a dict"})
|
| 792 |
+
# This will try .get() on a string, which fails
|
| 793 |
+
with pytest.raises(AttributeError):
|
| 794 |
+
_parse_llm_response(raw)
|
| 795 |
+
|
| 796 |
+
def test_golden_path_not_list_handled(self):
|
| 797 |
+
"""If golden_path is a non-list iterable (e.g. string), .get() on items fails."""
|
| 798 |
+
raw = json.dumps({"golden_path": "not a list"})
|
| 799 |
+
with pytest.raises(AttributeError):
|
| 800 |
+
_parse_llm_response(raw)
|
| 801 |
+
|
| 802 |
+
def test_empty_string_raises(self):
|
| 803 |
+
with pytest.raises(json.JSONDecodeError):
|
| 804 |
+
_parse_llm_response("")
|
| 805 |
+
|
| 806 |
+
def test_json_with_trailing_comma_raises(self):
|
| 807 |
+
with pytest.raises(json.JSONDecodeError):
|
| 808 |
+
_parse_llm_response('{"key": "value",}')
|
| 809 |
+
|
| 810 |
+
|
| 811 |
+
# ---------------------------------------------------------------------------
|
| 812 |
+
# 10. Vulnerability parsing details
|
| 813 |
+
# ---------------------------------------------------------------------------
|
| 814 |
+
|
| 815 |
+
|
| 816 |
+
class TestVulnerabilityParsing:
|
| 817 |
+
def test_all_vuln_fields_parsed(self):
|
| 818 |
+
raw = _minimal_json(
|
| 819 |
+
truth_graph={
|
| 820 |
+
"vulns": [
|
| 821 |
+
{
|
| 822 |
+
"id": "VULN-001",
|
| 823 |
+
"type": "sqli",
|
| 824 |
+
"host": "web",
|
| 825 |
+
"service": "nginx+php",
|
| 826 |
+
"injection_point": "/search?q=",
|
| 827 |
+
"vulnerable_code": "<?php $q=$_GET['q']; ?>",
|
| 828 |
+
"root_cause": "No input sanitization",
|
| 829 |
+
"blast_radius": "Full DB read",
|
| 830 |
+
"remediation": "Use prepared statements",
|
| 831 |
+
}
|
| 832 |
+
],
|
| 833 |
+
"exploit_chain": [],
|
| 834 |
+
}
|
| 835 |
+
)
|
| 836 |
+
spec = _parse_llm_response(raw)
|
| 837 |
+
v = spec.truth_graph.vulns[0]
|
| 838 |
+
assert v.id == "VULN-001"
|
| 839 |
+
assert v.type == "sqli"
|
| 840 |
+
assert v.host == "web"
|
| 841 |
+
assert v.service == "nginx+php"
|
| 842 |
+
assert v.injection_point == "/search?q="
|
| 843 |
+
assert v.vulnerable_code == "<?php $q=$_GET['q']; ?>"
|
| 844 |
+
assert v.root_cause == "No input sanitization"
|
| 845 |
+
assert v.blast_radius == "Full DB read"
|
| 846 |
+
assert v.remediation == "Use prepared statements"
|
| 847 |
+
|
| 848 |
+
def test_vulnerable_code_as_dict(self):
|
| 849 |
+
raw = _minimal_json(
|
| 850 |
+
truth_graph={
|
| 851 |
+
"vulns": [
|
| 852 |
+
{
|
| 853 |
+
"id": "V1",
|
| 854 |
+
"type": "sqli",
|
| 855 |
+
"host": "web",
|
| 856 |
+
"vulnerable_code": {
|
| 857 |
+
"/var/www/search.php": "<?php vuln code; ?>"
|
| 858 |
+
},
|
| 859 |
+
}
|
| 860 |
+
],
|
| 861 |
+
"exploit_chain": [],
|
| 862 |
+
}
|
| 863 |
+
)
|
| 864 |
+
spec = _parse_llm_response(raw)
|
| 865 |
+
v = spec.truth_graph.vulns[0]
|
| 866 |
+
assert isinstance(v.vulnerable_code, dict)
|
| 867 |
+
assert v.vulnerable_code["/var/www/search.php"] == "<?php vuln code; ?>"
|
| 868 |
+
|
| 869 |
+
def test_multiple_vulns(self):
|
| 870 |
+
raw = _minimal_json(
|
| 871 |
+
truth_graph={
|
| 872 |
+
"vulns": [
|
| 873 |
+
{"id": "V1", "type": "sqli", "host": "web"},
|
| 874 |
+
{"id": "V2", "type": "xss", "host": "web"},
|
| 875 |
+
{"id": "V3", "type": "idor", "host": "web"},
|
| 876 |
+
],
|
| 877 |
+
"exploit_chain": [],
|
| 878 |
+
}
|
| 879 |
+
)
|
| 880 |
+
spec = _parse_llm_response(raw)
|
| 881 |
+
assert len(spec.truth_graph.vulns) == 3
|
| 882 |
+
types = {v.type for v in spec.truth_graph.vulns}
|
| 883 |
+
assert types == {"sqli", "xss", "idor"}
|
| 884 |
+
|
| 885 |
+
|
| 886 |
+
# ---------------------------------------------------------------------------
|
| 887 |
+
# 11. Flag parsing
|
| 888 |
+
# ---------------------------------------------------------------------------
|
| 889 |
+
|
| 890 |
+
|
| 891 |
+
class TestFlagParsing:
|
| 892 |
+
def test_single_flag(self):
|
| 893 |
+
raw = _minimal_json(
|
| 894 |
+
flags=[
|
| 895 |
+
{
|
| 896 |
+
"id": "flag1",
|
| 897 |
+
"value": "FLAG{abc123}",
|
| 898 |
+
"path": "/var/flags/flag1.txt",
|
| 899 |
+
"host": "db",
|
| 900 |
+
}
|
| 901 |
+
]
|
| 902 |
+
)
|
| 903 |
+
spec = _parse_llm_response(raw)
|
| 904 |
+
assert len(spec.flags) == 1
|
| 905 |
+
f = spec.flags[0]
|
| 906 |
+
assert f.id == "flag1"
|
| 907 |
+
assert f.value == "FLAG{abc123}"
|
| 908 |
+
assert f.path == "/var/flags/flag1.txt"
|
| 909 |
+
assert f.host == "db"
|
| 910 |
+
|
| 911 |
+
def test_multiple_flags(self):
|
| 912 |
+
raw = _minimal_json(
|
| 913 |
+
flags=[
|
| 914 |
+
{"id": "f1", "value": "FLAG{a}", "path": "/f1.txt", "host": "web"},
|
| 915 |
+
{"id": "f2", "value": "FLAG{b}", "path": "/f2.txt", "host": "db"},
|
| 916 |
+
]
|
| 917 |
+
)
|
| 918 |
+
spec = _parse_llm_response(raw)
|
| 919 |
+
assert len(spec.flags) == 2
|
| 920 |
+
|
| 921 |
+
def test_missing_flag_fields_default_to_empty(self):
|
| 922 |
+
raw = _minimal_json(flags=[{}])
|
| 923 |
+
spec = _parse_llm_response(raw)
|
| 924 |
+
f = spec.flags[0]
|
| 925 |
+
assert f.id == ""
|
| 926 |
+
assert f.value == ""
|
| 927 |
+
assert f.path == ""
|
| 928 |
+
assert f.host == ""
|
| 929 |
+
|
| 930 |
+
|
| 931 |
+
# ---------------------------------------------------------------------------
|
| 932 |
+
# 12. NPC traffic parsing
|
| 933 |
+
# ---------------------------------------------------------------------------
|
| 934 |
+
|
| 935 |
+
|
| 936 |
+
class TestNPCTrafficParsing:
|
| 937 |
+
def test_http_rate_maps_to_rate_lambda(self):
|
| 938 |
+
raw = _minimal_json(npc_traffic={"http_rate": 25})
|
| 939 |
+
spec = _parse_llm_response(raw)
|
| 940 |
+
assert spec.npc_traffic.rate_lambda == 25
|
| 941 |
+
|
| 942 |
+
def test_default_scripts(self):
|
| 943 |
+
raw = _minimal_json(npc_traffic={})
|
| 944 |
+
spec = _parse_llm_response(raw)
|
| 945 |
+
assert "http_traffic.sh" in spec.npc_traffic.scripts
|
| 946 |
+
|
| 947 |
+
def test_level_always_zero(self):
|
| 948 |
+
"""Current parser hardcodes level=0."""
|
| 949 |
+
raw = _minimal_json(npc_traffic={"http_rate": 50})
|
| 950 |
+
spec = _parse_llm_response(raw)
|
| 951 |
+
assert spec.npc_traffic.level == 0
|
| 952 |
+
|
| 953 |
+
def test_missing_http_rate_defaults_to_10(self):
|
| 954 |
+
raw = _minimal_json(npc_traffic={})
|
| 955 |
+
spec = _parse_llm_response(raw)
|
| 956 |
+
assert spec.npc_traffic.rate_lambda == 10
|
| 957 |
+
|
| 958 |
+
|
| 959 |
+
# ---------------------------------------------------------------------------
|
| 960 |
+
# 13. Task parsing
|
| 961 |
+
# ---------------------------------------------------------------------------
|
| 962 |
+
|
| 963 |
+
|
| 964 |
+
class TestTaskParsing:
|
| 965 |
+
def test_both_briefings(self):
|
| 966 |
+
raw = _minimal_json(
|
| 967 |
+
task={
|
| 968 |
+
"red_briefing": "Attack the network.",
|
| 969 |
+
"blue_briefing": "Defend the network.",
|
| 970 |
+
}
|
| 971 |
+
)
|
| 972 |
+
spec = _parse_llm_response(raw)
|
| 973 |
+
assert spec.task.red_briefing == "Attack the network."
|
| 974 |
+
assert spec.task.blue_briefing == "Defend the network."
|
| 975 |
+
|
| 976 |
+
def test_missing_briefings_default_empty(self):
|
| 977 |
+
raw = _minimal_json(task={})
|
| 978 |
+
spec = _parse_llm_response(raw)
|
| 979 |
+
assert spec.task.red_briefing == ""
|
| 980 |
+
assert spec.task.blue_briefing == ""
|
| 981 |
+
|
| 982 |
+
def test_extra_task_fields_ignored(self):
|
| 983 |
+
"""Extra fields in task should be silently ignored."""
|
| 984 |
+
raw = _minimal_json(
|
| 985 |
+
task={
|
| 986 |
+
"red_briefing": "Go",
|
| 987 |
+
"blue_briefing": "Watch",
|
| 988 |
+
"unknown_field": "whatever",
|
| 989 |
+
}
|
| 990 |
+
)
|
| 991 |
+
spec = _parse_llm_response(raw)
|
| 992 |
+
assert spec.task.red_briefing == "Go"
|
| 993 |
+
|
| 994 |
+
|
| 995 |
+
# ---------------------------------------------------------------------------
|
| 996 |
+
# 14. Roundtrip / integration
|
| 997 |
+
# ---------------------------------------------------------------------------
|
| 998 |
+
|
| 999 |
+
|
| 1000 |
+
class TestRoundtrip:
|
| 1001 |
+
def test_complex_snapshot_parses_completely(self):
|
| 1002 |
+
"""A complex snapshot with all sections populated should parse."""
|
| 1003 |
+
data = {
|
| 1004 |
+
"topology": {
|
| 1005 |
+
"hosts": ["attacker", "web", "db", "siem"],
|
| 1006 |
+
"zones": {"dmz": ["web"], "internal": ["db"], "mgmt": ["siem"]},
|
| 1007 |
+
"users": [{"username": "admin", "password": "pass", "groups": ["admins"], "hosts": ["web"]}],
|
| 1008 |
+
},
|
| 1009 |
+
"truth_graph": {
|
| 1010 |
+
"vulns": [
|
| 1011 |
+
{
|
| 1012 |
+
"id": "V1",
|
| 1013 |
+
"type": "sqli",
|
| 1014 |
+
"host": "web",
|
| 1015 |
+
"service": "php",
|
| 1016 |
+
"injection_point": "/search?q=",
|
| 1017 |
+
"vulnerable_code": {"search.php": "vuln code"},
|
| 1018 |
+
"root_cause": "no sanitization",
|
| 1019 |
+
"blast_radius": "db read",
|
| 1020 |
+
"remediation": "prepared stmts",
|
| 1021 |
+
}
|
| 1022 |
+
],
|
| 1023 |
+
"exploit_chain": [
|
| 1024 |
+
{"vuln": "V1", "action": "sqlmap", "yields": "db dump"}
|
| 1025 |
+
],
|
| 1026 |
+
},
|
| 1027 |
+
"golden_path": [
|
| 1028 |
+
{"step": 1, "cmd": "nmap -sV 10.0.1.0/24", "expect_stdout": "80/tcp"},
|
| 1029 |
+
{"step": 2, "command": "curl http://web/search?q=test", "expect_in_stdout": "results"},
|
| 1030 |
+
],
|
| 1031 |
+
"flags": [
|
| 1032 |
+
{"id": "f1", "value": "FLAG{complex}", "path": "/flag.txt", "host": "db"}
|
| 1033 |
+
],
|
| 1034 |
+
"evidence_spec": {
|
| 1035 |
+
"web_log": "sqli pattern",
|
| 1036 |
+
"alerts": ["sql_injection_detected"],
|
| 1037 |
+
},
|
| 1038 |
+
"npc_personas": [
|
| 1039 |
+
{
|
| 1040 |
+
"name": "Alice",
|
| 1041 |
+
"role": "SysAdmin",
|
| 1042 |
+
"department": "IT",
|
| 1043 |
+
"reports_to": "CTO",
|
| 1044 |
+
"communication_style": "technical",
|
| 1045 |
+
"security_awareness": 0.9,
|
| 1046 |
+
"susceptibility": {"phishing": 0.1},
|
| 1047 |
+
"routine": {"morning": "check logs"},
|
| 1048 |
+
"accounts": {"email": "alice@corp.local"},
|
| 1049 |
+
}
|
| 1050 |
+
],
|
| 1051 |
+
"npc_traffic": {"http_rate": 20},
|
| 1052 |
+
"task": {
|
| 1053 |
+
"red_briefing": "Hack the network.",
|
| 1054 |
+
"blue_briefing": "Monitor and defend.",
|
| 1055 |
+
},
|
| 1056 |
+
"files": {"web:/var/www/index.php": "<?php echo 'hi'; ?>"},
|
| 1057 |
+
}
|
| 1058 |
+
spec = _parse_llm_response(json.dumps(data))
|
| 1059 |
+
|
| 1060 |
+
# Verify all sections
|
| 1061 |
+
assert spec.topology["hosts"] == ["attacker", "web", "db", "siem"]
|
| 1062 |
+
assert len(spec.truth_graph.vulns) == 1
|
| 1063 |
+
assert spec.truth_graph.exploit_chain[0].vuln_id == "V1"
|
| 1064 |
+
assert spec.truth_graph.exploit_chain[0].command == "sqlmap"
|
| 1065 |
+
assert len(spec.golden_path) == 2
|
| 1066 |
+
assert spec.golden_path[0].command == "nmap -sV 10.0.1.0/24"
|
| 1067 |
+
assert spec.golden_path[1].expect_in_stdout == "results"
|
| 1068 |
+
assert spec.flags[0].value == "FLAG{complex}"
|
| 1069 |
+
assert len(spec.evidence_spec) == 2 # 1 string + 1 list item
|
| 1070 |
+
assert len(spec.npc_personas) == 1
|
| 1071 |
+
assert spec.npc_traffic.rate_lambda == 20
|
| 1072 |
+
assert spec.task.red_briefing == "Hack the network."
|
| 1073 |
+
# files: explicit + vulnerable_code dict
|
| 1074 |
+
assert "web:/var/www/index.php" in spec.files
|
| 1075 |
+
assert "web:search.php" in spec.files # from vulnerable_code dict
|
tests/test_renderer_integration.py
ADDED
|
@@ -0,0 +1,373 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Integration tests for the full renderer pipeline.
|
| 2 |
+
|
| 3 |
+
Loads real LLM output from snapshots/llm_tier1_test.json, parses it
|
| 4 |
+
through _parse_llm_response(), renders through SnapshotRenderer.render(),
|
| 5 |
+
and verifies all output files contain expected content.
|
| 6 |
+
"""
|
| 7 |
+
|
| 8 |
+
from __future__ import annotations
|
| 9 |
+
|
| 10 |
+
import json
|
| 11 |
+
import tempfile
|
| 12 |
+
from pathlib import Path
|
| 13 |
+
|
| 14 |
+
import pytest
|
| 15 |
+
|
| 16 |
+
from open_range.builder.builder import _parse_llm_response
|
| 17 |
+
from open_range.builder.renderer import SnapshotRenderer
|
| 18 |
+
|
| 19 |
+
ROOT = Path(__file__).parent.parent
|
| 20 |
+
SNAPSHOT_PATH = ROOT / "snapshots" / "llm_tier1_test.json"
|
| 21 |
+
|
| 22 |
+
|
| 23 |
+
@pytest.fixture
|
| 24 |
+
def llm_output() -> dict:
|
| 25 |
+
"""Load the real LLM output JSON."""
|
| 26 |
+
return json.loads(SNAPSHOT_PATH.read_text())
|
| 27 |
+
|
| 28 |
+
|
| 29 |
+
@pytest.fixture
|
| 30 |
+
def parsed_spec(llm_output):
|
| 31 |
+
"""Parse real LLM output through _parse_llm_response."""
|
| 32 |
+
return _parse_llm_response(json.dumps(llm_output))
|
| 33 |
+
|
| 34 |
+
|
| 35 |
+
@pytest.fixture
|
| 36 |
+
def rendered_dir(parsed_spec):
|
| 37 |
+
"""Render the parsed spec and yield the output directory."""
|
| 38 |
+
renderer = SnapshotRenderer()
|
| 39 |
+
with tempfile.TemporaryDirectory() as tmpdir:
|
| 40 |
+
out = Path(tmpdir) / "integration_out"
|
| 41 |
+
renderer.render(parsed_spec, out)
|
| 42 |
+
yield out
|
| 43 |
+
|
| 44 |
+
|
| 45 |
+
# ---------------------------------------------------------------------------
|
| 46 |
+
# Pipeline: parse -> render round-trip
|
| 47 |
+
# ---------------------------------------------------------------------------
|
| 48 |
+
|
| 49 |
+
|
| 50 |
+
class TestParseLLMOutput:
|
| 51 |
+
"""Verify _parse_llm_response correctly handles real LLM output."""
|
| 52 |
+
|
| 53 |
+
def test_parse_produces_snapshot_spec(self, parsed_spec):
|
| 54 |
+
from open_range.protocols import SnapshotSpec
|
| 55 |
+
assert isinstance(parsed_spec, SnapshotSpec)
|
| 56 |
+
|
| 57 |
+
def test_parse_has_topology(self, parsed_spec):
|
| 58 |
+
assert "hosts" in parsed_spec.topology
|
| 59 |
+
assert len(parsed_spec.topology["hosts"]) == 8
|
| 60 |
+
|
| 61 |
+
def test_parse_has_vulns(self, parsed_spec):
|
| 62 |
+
assert len(parsed_spec.truth_graph.vulns) >= 1
|
| 63 |
+
vuln_types = {v.type for v in parsed_spec.truth_graph.vulns}
|
| 64 |
+
assert "sqli" in vuln_types
|
| 65 |
+
|
| 66 |
+
def test_parse_has_flags(self, parsed_spec):
|
| 67 |
+
assert len(parsed_spec.flags) >= 2
|
| 68 |
+
|
| 69 |
+
def test_parse_has_golden_path(self, parsed_spec):
|
| 70 |
+
assert len(parsed_spec.golden_path) >= 1
|
| 71 |
+
# Golden path steps should have commands
|
| 72 |
+
for step in parsed_spec.golden_path:
|
| 73 |
+
assert step.command, f"Step {step.step} has empty command"
|
| 74 |
+
|
| 75 |
+
def test_parse_has_task_briefings(self, parsed_spec):
|
| 76 |
+
assert parsed_spec.task.red_briefing
|
| 77 |
+
assert parsed_spec.task.blue_briefing
|
| 78 |
+
|
| 79 |
+
def test_parse_has_files(self, parsed_spec):
|
| 80 |
+
assert len(parsed_spec.files) > 0
|
| 81 |
+
# Should include web files and db:sql
|
| 82 |
+
web_files = [k for k in parsed_spec.files if k.startswith("web:")]
|
| 83 |
+
assert len(web_files) > 0
|
| 84 |
+
|
| 85 |
+
def test_parse_has_npc_personas(self, parsed_spec):
|
| 86 |
+
assert len(parsed_spec.npc_personas) >= 1
|
| 87 |
+
|
| 88 |
+
def test_golden_path_uses_command_field(self, parsed_spec):
|
| 89 |
+
"""LLM output uses 'cmd', parser should map to 'command'."""
|
| 90 |
+
for step in parsed_spec.golden_path:
|
| 91 |
+
assert step.command # Should be populated from 'cmd' key
|
| 92 |
+
|
| 93 |
+
def test_golden_path_uses_expect_in_stdout(self, parsed_spec):
|
| 94 |
+
"""LLM output uses 'expect_stdout', parser maps to 'expect_in_stdout'."""
|
| 95 |
+
for step in parsed_spec.golden_path:
|
| 96 |
+
assert step.expect_in_stdout
|
| 97 |
+
|
| 98 |
+
|
| 99 |
+
# ---------------------------------------------------------------------------
|
| 100 |
+
# All output files exist
|
| 101 |
+
# ---------------------------------------------------------------------------
|
| 102 |
+
|
| 103 |
+
|
| 104 |
+
class TestRenderedFilesExist:
|
| 105 |
+
"""Verify all 6 template outputs are created."""
|
| 106 |
+
|
| 107 |
+
EXPECTED_FILES = [
|
| 108 |
+
"docker-compose.yml",
|
| 109 |
+
"Dockerfile.web",
|
| 110 |
+
"Dockerfile.db",
|
| 111 |
+
"nginx.conf",
|
| 112 |
+
"init.sql",
|
| 113 |
+
"iptables.rules",
|
| 114 |
+
]
|
| 115 |
+
|
| 116 |
+
def test_all_output_files_exist(self, rendered_dir):
|
| 117 |
+
for fname in self.EXPECTED_FILES:
|
| 118 |
+
path = rendered_dir / fname
|
| 119 |
+
assert path.exists(), f"Missing output file: {fname}"
|
| 120 |
+
|
| 121 |
+
def test_all_output_files_non_empty(self, rendered_dir):
|
| 122 |
+
for fname in self.EXPECTED_FILES:
|
| 123 |
+
content = (rendered_dir / fname).read_text()
|
| 124 |
+
assert len(content) > 0, f"Empty output file: {fname}"
|
| 125 |
+
|
| 126 |
+
|
| 127 |
+
# ---------------------------------------------------------------------------
|
| 128 |
+
# nginx.conf content verification
|
| 129 |
+
# ---------------------------------------------------------------------------
|
| 130 |
+
|
| 131 |
+
|
| 132 |
+
class TestNginxConf:
|
| 133 |
+
"""Verify rendered nginx.conf has correct content."""
|
| 134 |
+
|
| 135 |
+
def test_references_php_fpm_socket(self, rendered_dir):
|
| 136 |
+
nginx = (rendered_dir / "nginx.conf").read_text()
|
| 137 |
+
assert "php8.1-fpm.sock" in nginx
|
| 138 |
+
|
| 139 |
+
def test_has_server_block(self, rendered_dir):
|
| 140 |
+
nginx = (rendered_dir / "nginx.conf").read_text()
|
| 141 |
+
assert "server {" in nginx
|
| 142 |
+
assert "listen 80" in nginx
|
| 143 |
+
|
| 144 |
+
def test_has_php_location(self, rendered_dir):
|
| 145 |
+
nginx = (rendered_dir / "nginx.conf").read_text()
|
| 146 |
+
assert "location ~ \\.php$" in nginx
|
| 147 |
+
|
| 148 |
+
def test_has_fastcgi_pass(self, rendered_dir):
|
| 149 |
+
nginx = (rendered_dir / "nginx.conf").read_text()
|
| 150 |
+
assert "fastcgi_pass unix:/run/php/php8.1-fpm.sock" in nginx
|
| 151 |
+
|
| 152 |
+
|
| 153 |
+
# ---------------------------------------------------------------------------
|
| 154 |
+
# docker-compose.yml content verification
|
| 155 |
+
# ---------------------------------------------------------------------------
|
| 156 |
+
|
| 157 |
+
|
| 158 |
+
class TestDockerCompose:
|
| 159 |
+
"""Verify rendered docker-compose.yml has correct static IPs and structure."""
|
| 160 |
+
|
| 161 |
+
def test_has_services_section(self, rendered_dir):
|
| 162 |
+
compose = (rendered_dir / "docker-compose.yml").read_text()
|
| 163 |
+
assert "services:" in compose
|
| 164 |
+
|
| 165 |
+
def test_has_all_core_services(self, rendered_dir):
|
| 166 |
+
compose = (rendered_dir / "docker-compose.yml").read_text()
|
| 167 |
+
for service in ["attacker:", "firewall:", "web:", "mail:", "db:", "siem:", "ldap:", "files:"]:
|
| 168 |
+
assert service in compose, f"Missing service: {service}"
|
| 169 |
+
|
| 170 |
+
def test_has_network_definitions(self, rendered_dir):
|
| 171 |
+
compose = (rendered_dir / "docker-compose.yml").read_text()
|
| 172 |
+
assert "networks:" in compose
|
| 173 |
+
assert "external:" in compose
|
| 174 |
+
assert "dmz:" in compose
|
| 175 |
+
assert "internal:" in compose
|
| 176 |
+
assert "management:" in compose
|
| 177 |
+
|
| 178 |
+
def test_has_static_ips(self, rendered_dir):
|
| 179 |
+
compose = (rendered_dir / "docker-compose.yml").read_text()
|
| 180 |
+
# Key static IPs from the template
|
| 181 |
+
assert "10.0.0.10" in compose # attacker
|
| 182 |
+
assert "10.0.0.2" in compose # firewall external
|
| 183 |
+
assert "10.0.1.10" in compose # web dmz
|
| 184 |
+
assert "10.0.2.20" in compose # db internal
|
| 185 |
+
assert "10.0.3.20" in compose # ldap management
|
| 186 |
+
assert "10.0.3.21" in compose # siem management
|
| 187 |
+
|
| 188 |
+
def test_web_depends_on_db(self, rendered_dir):
|
| 189 |
+
compose = (rendered_dir / "docker-compose.yml").read_text()
|
| 190 |
+
# web service should have depends_on db
|
| 191 |
+
assert "depends_on:" in compose
|
| 192 |
+
|
| 193 |
+
def test_has_subnet_definitions(self, rendered_dir):
|
| 194 |
+
compose = (rendered_dir / "docker-compose.yml").read_text()
|
| 195 |
+
assert "10.0.0.0/24" in compose # external
|
| 196 |
+
assert "10.0.1.0/24" in compose # dmz
|
| 197 |
+
assert "10.0.2.0/24" in compose # internal
|
| 198 |
+
assert "10.0.3.0/24" in compose # management
|
| 199 |
+
|
| 200 |
+
def test_has_healthchecks(self, rendered_dir):
|
| 201 |
+
compose = (rendered_dir / "docker-compose.yml").read_text()
|
| 202 |
+
assert "healthcheck:" in compose
|
| 203 |
+
|
| 204 |
+
def test_attacker_has_net_admin(self, rendered_dir):
|
| 205 |
+
compose = (rendered_dir / "docker-compose.yml").read_text()
|
| 206 |
+
assert "NET_ADMIN" in compose
|
| 207 |
+
|
| 208 |
+
def test_db_has_mysql_env_vars(self, rendered_dir):
|
| 209 |
+
compose = (rendered_dir / "docker-compose.yml").read_text()
|
| 210 |
+
assert "MYSQL_ROOT_PASSWORD" in compose
|
| 211 |
+
assert "MYSQL_DATABASE=referral_db" in compose
|
| 212 |
+
assert "MYSQL_USER=app_user" in compose
|
| 213 |
+
|
| 214 |
+
|
| 215 |
+
# ---------------------------------------------------------------------------
|
| 216 |
+
# init.sql content verification
|
| 217 |
+
# ---------------------------------------------------------------------------
|
| 218 |
+
|
| 219 |
+
|
| 220 |
+
class TestInitSQL:
|
| 221 |
+
"""Verify rendered init.sql has referral_db and app_user."""
|
| 222 |
+
|
| 223 |
+
def test_creates_referral_db(self, rendered_dir):
|
| 224 |
+
sql = (rendered_dir / "init.sql").read_text()
|
| 225 |
+
assert "referral_db" in sql
|
| 226 |
+
|
| 227 |
+
def test_creates_flags_db(self, rendered_dir):
|
| 228 |
+
sql = (rendered_dir / "init.sql").read_text()
|
| 229 |
+
assert "flags" in sql
|
| 230 |
+
|
| 231 |
+
def test_creates_core_tables(self, rendered_dir):
|
| 232 |
+
sql = (rendered_dir / "init.sql").read_text()
|
| 233 |
+
assert "CREATE TABLE" in sql
|
| 234 |
+
assert "users" in sql
|
| 235 |
+
assert "patients" in sql
|
| 236 |
+
assert "secrets" in sql
|
| 237 |
+
|
| 238 |
+
def test_creates_healthcare_tables(self, rendered_dir):
|
| 239 |
+
sql = (rendered_dir / "init.sql").read_text()
|
| 240 |
+
assert "patient_referrals" in sql
|
| 241 |
+
assert "billing" in sql
|
| 242 |
+
|
| 243 |
+
def test_grants_app_user(self, rendered_dir):
|
| 244 |
+
sql = (rendered_dir / "init.sql").read_text()
|
| 245 |
+
assert "app_user" in sql
|
| 246 |
+
assert "GRANT" in sql
|
| 247 |
+
|
| 248 |
+
def test_has_flush_privileges(self, rendered_dir):
|
| 249 |
+
sql = (rendered_dir / "init.sql").read_text()
|
| 250 |
+
assert "FLUSH PRIVILEGES" in sql
|
| 251 |
+
|
| 252 |
+
|
| 253 |
+
# ---------------------------------------------------------------------------
|
| 254 |
+
# Dockerfile.web content verification
|
| 255 |
+
# ---------------------------------------------------------------------------
|
| 256 |
+
|
| 257 |
+
|
| 258 |
+
class TestDockerfileWeb:
|
| 259 |
+
"""Verify rendered Dockerfile.web creates users from topology."""
|
| 260 |
+
|
| 261 |
+
def test_creates_users_from_topology(self, rendered_dir, parsed_spec):
|
| 262 |
+
dockerfile = (rendered_dir / "Dockerfile.web").read_text()
|
| 263 |
+
# Should have useradd for users from topology
|
| 264 |
+
users = parsed_spec.topology.get("users", [])
|
| 265 |
+
assert len(users) > 0, "Parsed spec should have users"
|
| 266 |
+
for user in users:
|
| 267 |
+
username = user.get("username", "")
|
| 268 |
+
if username:
|
| 269 |
+
assert "useradd" in dockerfile
|
| 270 |
+
|
| 271 |
+
def test_has_php_fpm(self, rendered_dir):
|
| 272 |
+
dockerfile = (rendered_dir / "Dockerfile.web").read_text()
|
| 273 |
+
assert "php8.1-fpm" in dockerfile
|
| 274 |
+
|
| 275 |
+
def test_has_nginx(self, rendered_dir):
|
| 276 |
+
dockerfile = (rendered_dir / "Dockerfile.web").read_text()
|
| 277 |
+
assert "nginx" in dockerfile
|
| 278 |
+
|
| 279 |
+
def test_copies_nginx_conf(self, rendered_dir):
|
| 280 |
+
dockerfile = (rendered_dir / "Dockerfile.web").read_text()
|
| 281 |
+
assert "COPY nginx.conf" in dockerfile
|
| 282 |
+
|
| 283 |
+
def test_exposes_ports(self, rendered_dir):
|
| 284 |
+
dockerfile = (rendered_dir / "Dockerfile.web").read_text()
|
| 285 |
+
assert "EXPOSE" in dockerfile
|
| 286 |
+
assert "80" in dockerfile
|
| 287 |
+
|
| 288 |
+
def test_plants_file_flags(self, rendered_dir, parsed_spec):
|
| 289 |
+
"""Flags with file paths on web host should appear in Dockerfile."""
|
| 290 |
+
dockerfile = (rendered_dir / "Dockerfile.web").read_text()
|
| 291 |
+
for flag in parsed_spec.flags:
|
| 292 |
+
if flag.host == "web" and "/" in flag.path:
|
| 293 |
+
assert flag.value in dockerfile, (
|
| 294 |
+
f"Flag {flag.id} ({flag.value}) not in Dockerfile.web"
|
| 295 |
+
)
|
| 296 |
+
|
| 297 |
+
def test_db_flags_not_in_dockerfile(self, rendered_dir, parsed_spec):
|
| 298 |
+
"""Flags with db: paths should NOT appear in Dockerfile.web."""
|
| 299 |
+
dockerfile = (rendered_dir / "Dockerfile.web").read_text()
|
| 300 |
+
for flag in parsed_spec.flags:
|
| 301 |
+
if flag.path.startswith("mysql:") or flag.path.startswith("db:"):
|
| 302 |
+
assert flag.value not in dockerfile, (
|
| 303 |
+
f"DB flag {flag.id} ({flag.value}) should not be in Dockerfile.web"
|
| 304 |
+
)
|
| 305 |
+
|
| 306 |
+
|
| 307 |
+
# ---------------------------------------------------------------------------
|
| 308 |
+
# iptables.rules content verification
|
| 309 |
+
# ---------------------------------------------------------------------------
|
| 310 |
+
|
| 311 |
+
|
| 312 |
+
class TestIptablesRules:
|
| 313 |
+
"""Verify rendered iptables.rules has correct structure."""
|
| 314 |
+
|
| 315 |
+
def test_has_filter_table(self, rendered_dir):
|
| 316 |
+
rules = (rendered_dir / "iptables.rules").read_text()
|
| 317 |
+
assert "*filter" in rules
|
| 318 |
+
assert "COMMIT" in rules
|
| 319 |
+
|
| 320 |
+
def test_has_forward_chain(self, rendered_dir):
|
| 321 |
+
rules = (rendered_dir / "iptables.rules").read_text()
|
| 322 |
+
assert "FORWARD" in rules
|
| 323 |
+
|
| 324 |
+
|
| 325 |
+
# ---------------------------------------------------------------------------
|
| 326 |
+
# Full round-trip: files dict is preserved through parse
|
| 327 |
+
# ---------------------------------------------------------------------------
|
| 328 |
+
|
| 329 |
+
|
| 330 |
+
class TestFilesPreserved:
|
| 331 |
+
"""Verify that files from LLM output survive the parse pipeline."""
|
| 332 |
+
|
| 333 |
+
def test_files_dict_has_web_files(self, parsed_spec):
|
| 334 |
+
web_files = {k: v for k, v in parsed_spec.files.items() if k.startswith("web:")}
|
| 335 |
+
assert len(web_files) > 0
|
| 336 |
+
|
| 337 |
+
def test_files_dict_has_sql(self, parsed_spec):
|
| 338 |
+
assert "db:sql" in parsed_spec.files
|
| 339 |
+
|
| 340 |
+
def test_index_php_content(self, parsed_spec):
|
| 341 |
+
key = "web:/var/www/portal/index.php"
|
| 342 |
+
assert key in parsed_spec.files
|
| 343 |
+
assert "Meridian Referral Portal" in parsed_spec.files[key]
|
| 344 |
+
|
| 345 |
+
def test_lookup_php_has_sqli(self, parsed_spec):
|
| 346 |
+
key = "web:/var/www/portal/lookup.php"
|
| 347 |
+
assert key in parsed_spec.files
|
| 348 |
+
content = parsed_spec.files[key]
|
| 349 |
+
# Should contain the vulnerable SQL query
|
| 350 |
+
assert "last_name LIKE" in content or "$last" in content
|
| 351 |
+
|
| 352 |
+
def test_compliance_report_has_flag(self, parsed_spec):
|
| 353 |
+
key = "web:/var/www/portal/reports/hipaa_audit.txt"
|
| 354 |
+
assert key in parsed_spec.files
|
| 355 |
+
assert "FLAG{1a2b3c4d5e6f7788}" in parsed_spec.files[key]
|
| 356 |
+
|
| 357 |
+
def test_sql_has_user_inserts(self, parsed_spec):
|
| 358 |
+
sql = parsed_spec.files.get("db:sql", "")
|
| 359 |
+
assert "dthompson" in sql
|
| 360 |
+
assert "kwilliams" in sql
|
| 361 |
+
|
| 362 |
+
def test_sql_has_flag_insert(self, parsed_spec):
|
| 363 |
+
sql = parsed_spec.files.get("db:sql", "")
|
| 364 |
+
assert "FLAG{9f3a2b4c5d6e7f80}" in sql
|
| 365 |
+
|
| 366 |
+
def test_files_samba_shares(self, parsed_spec):
|
| 367 |
+
files_entries = {k: v for k, v in parsed_spec.files.items() if k.startswith("files:")}
|
| 368 |
+
assert len(files_entries) > 0
|
| 369 |
+
|
| 370 |
+
def test_db_backup_script(self, parsed_spec):
|
| 371 |
+
key = "db:/opt/scripts/db_backup.sh"
|
| 372 |
+
assert key in parsed_spec.files
|
| 373 |
+
assert "mysqldump" in parsed_spec.files[key]
|
uv.lock
CHANGED
|
@@ -1862,52 +1862,6 @@ wheels = [
|
|
| 1862 |
{ url = "https://files.pythonhosted.org/packages/a2/eb/86626c1bbc2edb86323022371c39aa48df6fd8b0a1647bc274577f72e90b/nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:5b17e2001cc0d751a5bc2c6ec6d26ad95913324a4adb86788c944f8ce9ba441f", size = 89954, upload-time = "2025-03-07T01:42:44.131Z" },
|
| 1863 |
]
|
| 1864 |
|
| 1865 |
-
[[package]]
|
| 1866 |
-
name = "open-range"
|
| 1867 |
-
version = "0.1.0"
|
| 1868 |
-
source = { editable = "." }
|
| 1869 |
-
dependencies = [
|
| 1870 |
-
{ name = "docker" },
|
| 1871 |
-
{ name = "fastapi" },
|
| 1872 |
-
{ name = "jinja2" },
|
| 1873 |
-
{ name = "openenv-core", extra = ["core"] },
|
| 1874 |
-
{ name = "pydantic" },
|
| 1875 |
-
{ name = "pyyaml" },
|
| 1876 |
-
{ name = "uvicorn" },
|
| 1877 |
-
]
|
| 1878 |
-
|
| 1879 |
-
[package.optional-dependencies]
|
| 1880 |
-
builder = [
|
| 1881 |
-
{ name = "litellm" },
|
| 1882 |
-
]
|
| 1883 |
-
dev = [
|
| 1884 |
-
{ name = "httpx" },
|
| 1885 |
-
{ name = "pytest" },
|
| 1886 |
-
{ name = "pytest-asyncio" },
|
| 1887 |
-
]
|
| 1888 |
-
training = [
|
| 1889 |
-
{ name = "trl" },
|
| 1890 |
-
{ name = "unsloth" },
|
| 1891 |
-
]
|
| 1892 |
-
|
| 1893 |
-
[package.metadata]
|
| 1894 |
-
requires-dist = [
|
| 1895 |
-
{ name = "docker", specifier = ">=7.0" },
|
| 1896 |
-
{ name = "fastapi", specifier = ">=0.115" },
|
| 1897 |
-
{ name = "httpx", marker = "extra == 'dev'", specifier = ">=0.27" },
|
| 1898 |
-
{ name = "jinja2", specifier = ">=3.1" },
|
| 1899 |
-
{ name = "litellm", marker = "extra == 'builder'", specifier = ">=1.30" },
|
| 1900 |
-
{ name = "openenv-core", extras = ["core"], specifier = ">=0.2.1" },
|
| 1901 |
-
{ name = "pydantic", specifier = ">=2.0" },
|
| 1902 |
-
{ name = "pytest", marker = "extra == 'dev'", specifier = ">=8.0" },
|
| 1903 |
-
{ name = "pytest-asyncio", marker = "extra == 'dev'", specifier = ">=0.23" },
|
| 1904 |
-
{ name = "pyyaml", specifier = ">=6.0" },
|
| 1905 |
-
{ name = "trl", marker = "extra == 'training'", specifier = ">=0.8" },
|
| 1906 |
-
{ name = "unsloth", marker = "extra == 'training'" },
|
| 1907 |
-
{ name = "uvicorn", specifier = ">=0.27" },
|
| 1908 |
-
]
|
| 1909 |
-
provides-extras = ["dev", "training", "builder"]
|
| 1910 |
-
|
| 1911 |
[[package]]
|
| 1912 |
name = "openai"
|
| 1913 |
version = "2.26.0"
|
|
@@ -1972,6 +1926,54 @@ core = [
|
|
| 1972 |
{ name = "websockets" },
|
| 1973 |
]
|
| 1974 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1975 |
[[package]]
|
| 1976 |
name = "opentelemetry-api"
|
| 1977 |
version = "1.40.0"
|
|
|
|
| 1862 |
{ url = "https://files.pythonhosted.org/packages/a2/eb/86626c1bbc2edb86323022371c39aa48df6fd8b0a1647bc274577f72e90b/nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:5b17e2001cc0d751a5bc2c6ec6d26ad95913324a4adb86788c944f8ce9ba441f", size = 89954, upload-time = "2025-03-07T01:42:44.131Z" },
|
| 1863 |
]
|
| 1864 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1865 |
[[package]]
|
| 1866 |
name = "openai"
|
| 1867 |
version = "2.26.0"
|
|
|
|
| 1926 |
{ name = "websockets" },
|
| 1927 |
]
|
| 1928 |
|
| 1929 |
+
[[package]]
|
| 1930 |
+
name = "openenv-open-range"
|
| 1931 |
+
version = "0.1.0"
|
| 1932 |
+
source = { editable = "." }
|
| 1933 |
+
dependencies = [
|
| 1934 |
+
{ name = "click" },
|
| 1935 |
+
{ name = "docker" },
|
| 1936 |
+
{ name = "fastapi" },
|
| 1937 |
+
{ name = "jinja2" },
|
| 1938 |
+
{ name = "openenv-core", extra = ["core"] },
|
| 1939 |
+
{ name = "pydantic" },
|
| 1940 |
+
{ name = "pyyaml" },
|
| 1941 |
+
{ name = "uvicorn" },
|
| 1942 |
+
]
|
| 1943 |
+
|
| 1944 |
+
[package.optional-dependencies]
|
| 1945 |
+
builder = [
|
| 1946 |
+
{ name = "litellm" },
|
| 1947 |
+
]
|
| 1948 |
+
dev = [
|
| 1949 |
+
{ name = "httpx" },
|
| 1950 |
+
{ name = "pytest" },
|
| 1951 |
+
{ name = "pytest-asyncio" },
|
| 1952 |
+
]
|
| 1953 |
+
training = [
|
| 1954 |
+
{ name = "trl" },
|
| 1955 |
+
{ name = "unsloth" },
|
| 1956 |
+
]
|
| 1957 |
+
|
| 1958 |
+
[package.metadata]
|
| 1959 |
+
requires-dist = [
|
| 1960 |
+
{ name = "click", specifier = ">=8.1" },
|
| 1961 |
+
{ name = "docker", specifier = ">=7.0" },
|
| 1962 |
+
{ name = "fastapi", specifier = ">=0.115.0" },
|
| 1963 |
+
{ name = "httpx", marker = "extra == 'dev'", specifier = ">=0.27" },
|
| 1964 |
+
{ name = "jinja2", specifier = ">=3.1" },
|
| 1965 |
+
{ name = "litellm", marker = "extra == 'builder'", specifier = ">=1.30" },
|
| 1966 |
+
{ name = "openenv-core", extras = ["core"], specifier = ">=0.2.1" },
|
| 1967 |
+
{ name = "pydantic", specifier = ">=2.0.0" },
|
| 1968 |
+
{ name = "pytest", marker = "extra == 'dev'", specifier = ">=8.0" },
|
| 1969 |
+
{ name = "pytest-asyncio", marker = "extra == 'dev'", specifier = ">=0.23" },
|
| 1970 |
+
{ name = "pyyaml", specifier = ">=6.0" },
|
| 1971 |
+
{ name = "trl", marker = "extra == 'training'", specifier = ">=0.8" },
|
| 1972 |
+
{ name = "unsloth", marker = "extra == 'training'" },
|
| 1973 |
+
{ name = "uvicorn", specifier = ">=0.24.0" },
|
| 1974 |
+
]
|
| 1975 |
+
provides-extras = ["dev", "training", "builder"]
|
| 1976 |
+
|
| 1977 |
[[package]]
|
| 1978 |
name = "opentelemetry-api"
|
| 1979 |
version = "1.40.0"
|