Spaces:

abrown31
/

open-range

Runtime error

Aaron Brown commited on Mar 8

Commit

f016eb7

1 Parent(s): 1566173

Add episode CLI, synthetic data pipeline, NPC generalization, service manifest

- Add `openrange episode` command (golden-path replay + interactive REPL)
- Add synthetic data generation pipeline with teacher agents and dataset helpers
- Generalize NPC actions/traffic scripts to derive targets from SnapshotSpec
- Add service_manifest.py for snapshot-declared service lifecycle
- Add replay_agent, dataset module, NPC reward coupling tests
- Graceful container resolution when Docker is available but no containers running
- Skip tests that depend on deleted snapshot fixtures
- Remove scripted_agent (replaced by replay_agent)

Files changed (40) hide show

.gitignore +3 -0
README.md +21 -0
data/README.md +46 -0
data/sft.jsonl +3 -0
data/tool_info.md +10 -0
docs/agent-protocols.md +2 -2
docs/red-blue-agents.md +1 -1
docs/synthetic-data.md +13 -0
src/open_range/agents/__init__.py +2 -2
src/open_range/agents/llm_agent.py +17 -4
src/open_range/agents/parsing.py +40 -0
src/open_range/agents/{scripted_agent.py → replay_agent.py} +56 -15
src/open_range/agents/solvers.py +1 -1
src/open_range/builder/npc/actions.py +80 -14
src/open_range/builder/npc/channels.py +6 -0
src/open_range/builder/npc/db_traffic.sh +47 -20
src/open_range/builder/npc/http_traffic.sh +37 -41
src/open_range/builder/npc/npc_agent.py +27 -10
src/open_range/builder/npc/npc_manager.py +33 -9
src/open_range/builder/npc/persona.py +7 -3
src/open_range/builder/npc/ssh_traffic.sh +3 -3
src/open_range/builder/renderer.py +56 -6
src/open_range/builder/service_manifest.py +395 -0
src/open_range/builder/templates/docker-compose.yml.j2 +10 -11
src/open_range/cli.py +170 -2
src/open_range/protocols.py +45 -1
src/open_range/server/environment.py +55 -9
src/open_range/server/rewards.py +2 -1
src/open_range/training/__init__.py +12 -0
src/open_range/training/dataset.py +170 -0
src/open_range/training/synthetic.py +357 -24
src/open_range/training/trajectory.py +79 -34
tests/test_agents.py +3 -3
tests/test_demo.py +1 -1
tests/test_npc_reward_coupling.py +365 -0
tests/test_parse_llm_response.py +2 -0
tests/test_renderer_integration.py +4 -2
tests/test_solvers.py +1 -1
tests/test_synthetic.py +167 -1
tests/test_trajectory.py +62 -12

.gitignore CHANGED Viewed

@@ -53,6 +53,9 @@ IMPLEMENTATION_PLAN.md
 .coverage
 htmlcov/
 # Pre-validated range pool (generated at startup)
 pool/
 snapshots/

 .coverage
 htmlcov/
+# Synthetic data outputs
+data/synthetic*.jsonl
 # Pre-validated range pool (generated at startup)
 pool/
 snapshots/

README.md CHANGED Viewed

@@ -66,6 +66,15 @@ uv run openrange synthetic-data \
   --output data/sft_red.jsonl \
   --roles red
 # Run the OpenEnv client against a running server
 uv run python examples/remote_client_demo.py --base-url http://localhost:8000
@@ -104,6 +113,18 @@ The deployed package exposes the standard OpenEnv `reset()`, `step()`, and `stat
 | Stealth (inversely coupled to Blue detection) | Availability (healthcheck fraction) |
 | Anti-hallucination (-0.3 per fake flag) | False positive penalty (-0.2 per NPC flagged) |
 **Agents** — Structural protocol: any object with `reset(briefing, role)` and `act(observation) -> command` works. Ships with `LLMRangeAgent` (litellm, any provider), `ScriptedAgent`, and `HumanAgent`.
 **Synthetic Data** — `open_range.training.synthetic` provides snapshot-grounded trajectory generation for SFT warm-start. It uses a fast simulated `RangeEnvironment`, optional LiteLLM teacher agents, per-episode flag randomization, and exports JSONL through `TrajectoryLogger`.

   --output data/sft_red.jsonl \
   --roles red
+# Merge local bootstrap traces and tool context into generated output
+uv run openrange synthetic-data \
+  --manifest manifests/tier1_basic.yaml \
+  --output data/synthetic_sft_5.jsonl \
+  --num-traces 5 \
+  --roles red \
+  --bootstrap-traces data/sft.jsonl \
+  --tool-info data/tool_info.md
 # Run the OpenEnv client against a running server
 uv run python examples/remote_client_demo.py --base-url http://localhost:8000
 | Stealth (inversely coupled to Blue detection) | Availability (healthcheck fraction) |
 | Anti-hallucination (-0.3 per fake flag) | False positive penalty (-0.2 per NPC flagged) |
+**NPC Traffic** — Background noise and social engineering surface. Two levels:
+- **Level 0** (shell scripts): `http_traffic.sh`, `db_traffic.sh`, `ssh_traffic.sh` generate benign traffic that Blue must filter from real attacks. Scripts discover targets dynamically (available pages, databases, tables) — no hardcoded endpoints.
+- **Level 1** (LLM agents): Each NPC persona runs an autonomous workday via LiteLLM — browsing pages, sending emails, querying databases, accessing file shares. NPCs also react to incoming stimuli (phishing emails) based on their `security_awareness` profile.
+All NPC actions are derived from the `SnapshotSpec` at runtime (pages, shares, tables, credentials, domain), so they generalize to any Builder-generated environment. NPC logs carry structured fields (`type`, `label`, `source`, `result`) that couple directly to Red/Blue reward signals.
+Configure the NPC model via environment variable:
+```bash
+export OPENRANGE_NPC_MODEL="azure/gpt-5.2-codex"  # or openai/gpt-4o, anthropic/claude-haiku-4-5-20251001, ollama/llama3
+```
 **Agents** — Structural protocol: any object with `reset(briefing, role)` and `act(observation) -> command` works. Ships with `LLMRangeAgent` (litellm, any provider), `ScriptedAgent`, and `HumanAgent`.
 **Synthetic Data** — `open_range.training.synthetic` provides snapshot-grounded trajectory generation for SFT warm-start. It uses a fast simulated `RangeEnvironment`, optional LiteLLM teacher agents, per-episode flag randomization, and exports JSONL through `TrajectoryLogger`.

data/README.md ADDED Viewed

	@@ -0,0 +1,46 @@

+# Training Data
+Seed and generated datasets for SFT warm-start live under `data/`.
+## Files
+| File | Purpose |
+|------|---------|
+| `sft.jsonl` | Seed SFT dataset in ChatML format, including assistant tool calls and tool responses. |
+| `tool_info.md` | Reusable tool catalog that can be injected into generated system prompts with `--tool-info`. |
+| `synthetic*.jsonl` | Generated synthetic datasets from `openrange synthetic-data` (gitignored). |
+## Seed SFT Format
+Each line in `sft.jsonl` is a single solved trajectory:
+```json
+{
+  "messages": [
+    {"role": "system", "content": "..."},
+    {"role": "user", "content": "..."},
+    {"role": "assistant", "content": "...", "tool_calls": [...]},
+    {"role": "tool", "tool_call_id": "...", "name": "shell_command", "content": "..."}
+  ],
+  "metadata": {"source": "bootstrap", "success": true},
+  "ground_truth_flag": "FLAG{...}",
+  "optimal_steps": 8
+}
+```
+## Generating Synthetic Data
+Use the seed file as bootstrap context and merge newly generated OpenRange traces into a single output:
+```bash
+uv run --extra synthetic openrange synthetic-data \
+  --manifest manifests/tier1_basic.yaml \
+  --output data/synthetic_sft_5.jsonl \
+  --num-traces 5 \
+  --roles red \
+  --teacher-model azure/gpt-5.2-codex \
+  --bootstrap-traces data/sft.jsonl \
+  --tool-info data/tool_info.md
+```
+The output file keeps the imported bootstrap records intact and appends the generated OpenRange records after them.

data/sft.jsonl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:26c6f43a70274ba9f193eeb73b349a55970a99d280a19dae455f54708cf32841
+size 14915149

data/tool_info.md ADDED Viewed

	@@ -0,0 +1,10 @@

+- `shell_command`: run one-shot shell commands and capture stdout/stderr
+- `exec_command`: start an interactive PTY process
+- `write_stdin`: send follow-up input to a running PTY session
+- `python_code`: execute Python for local analysis or decoding
+- `read_file`: inspect source or challenge files directly
+- `grep`: search for indicators, flags, or suspicious patterns in files
+- `file_search`: locate files by name or path fragment
+- `flag_found`: submit a recovered flag immediately
+- `web_search`: look up external references when the environment allows it
+- `apply_patch`: edit local files when the task requires code changes

docs/agent-protocols.md CHANGED Viewed

@@ -117,7 +117,7 @@ class RangeAgent(Protocol):
 | Implementation | File | When to use | LLM? |
 |----------------|------|------------|------|
 | `LLMRangeAgent` | `src/open_range/agents/llm_agent.py` | Production — model-agnostic via LiteLLM | Yes (LiteLLM) |
-| `ScriptedAgent` | `src/open_range/agents/scripted_agent.py` | Testing/CI/demos — replays fixed command list | No |
 | `HumanAgent` | `src/open_range/agents/human_agent.py` | Manual play/debugging — stdin/stdout | No |
 ```python
@@ -162,7 +162,7 @@ class HumanAgent:
     def act(self, observation: str) -> str: ...
 ```
-Pre-built demo agents are also available as `ScriptedRedAgent` and `ScriptedBlueAgent` in `src/open_range/agents/scripted_agent.py`.
 ### Builder

 | Implementation | File | When to use | LLM? |
 |----------------|------|------------|------|
 | `LLMRangeAgent` | `src/open_range/agents/llm_agent.py` | Production — model-agnostic via LiteLLM | Yes (LiteLLM) |
+| `ScriptedAgent` | `src/open_range/agents/replay_agent.py` | Testing/CI/demos — replays fixed command list | No |
 | `HumanAgent` | `src/open_range/agents/human_agent.py` | Manual play/debugging — stdin/stdout | No |
 ```python
     def act(self, observation: str) -> str: ...
 ```
+Pre-built demo agents are also available as `ScriptedRedAgent` and `ScriptedBlueAgent` in `src/open_range/agents/replay_agent.py`.
 ### Builder

docs/red-blue-agents.md CHANGED Viewed

@@ -688,7 +688,7 @@ agents/
 ├── __init__.py           # Public API (re-exports all key symbols)
 ├── protocol.py           # RangeAgent protocol + EpisodeResult + EpisodeMetrics dataclasses
 ├── llm_agent.py          # LLMRangeAgent (LiteLLM -- any model)
-├── scripted_agent.py     # ScriptedAgent, ScriptedRedAgent, ScriptedBlueAgent (demo/test)
 ├── human_agent.py        # HumanAgent (interactive terminal)
 ├── prompts.py            # RED_SYSTEM_PROMPT, BLUE_SYSTEM_PROMPT
 ├── parsing.py            # extract_command() -- pull command from LLM text

 ├── __init__.py           # Public API (re-exports all key symbols)
 ├── protocol.py           # RangeAgent protocol + EpisodeResult + EpisodeMetrics dataclasses
 ├── llm_agent.py          # LLMRangeAgent (LiteLLM -- any model)
+├── replay_agent.py       # ScriptedAgent, ScriptedRedAgent, ScriptedBlueAgent (demo/test)
 ├── human_agent.py        # HumanAgent (interactive terminal)
 ├── prompts.py            # RED_SYSTEM_PROMPT, BLUE_SYSTEM_PROMPT
 ├── parsing.py            # extract_command() -- pull command from LLM text

docs/synthetic-data.md CHANGED Viewed

@@ -55,6 +55,19 @@ uv run openrange synthetic-data \
   --roles red
 ```
 Generate traces from a manifest using the deterministic builder:
 ```bash

   --roles red
 ```
+Merge previously collected bootstrap traces and append a reusable tool catalog to generated system prompts:
+```bash
+uv run openrange synthetic-data \
+  --manifest manifests/tier1_basic.yaml \
+  --output data/synthetic_sft_5.jsonl \
+  --num-traces 5 \
+  --roles red \
+  --teacher-model azure/gpt-5.2-codex \
+  --bootstrap-traces data/sft.jsonl \
+  --tool-info data/tool_info.md
+```
 Generate traces from a manifest using the deterministic builder:
 ```bash

src/open_range/agents/__init__.py CHANGED Viewed

@@ -4,7 +4,7 @@ Exports:
     - RangeAgent: Protocol for any compatible agent
     - EpisodeResult, EpisodeMetrics: Trajectory/metrics dataclasses
     - LLMRangeAgent: LiteLLM-powered agent (any model)
-    - ScriptedAgent, ScriptedRedAgent, ScriptedBlueAgent: Fixed-sequence agents
     - HumanAgent: Interactive stdin/stdout agent
     - run_episode: Orchestration loop
     - evaluate: Multi-episode evaluation harness
@@ -13,7 +13,7 @@ Exports:
 from open_range.agents.protocol import EpisodeMetrics, EpisodeResult, RangeAgent
 from open_range.agents.parsing import extract_command
-from open_range.agents.scripted_agent import (
     ScriptedAgent,
     ScriptedBlueAgent,
     ScriptedRedAgent,

     - RangeAgent: Protocol for any compatible agent
     - EpisodeResult, EpisodeMetrics: Trajectory/metrics dataclasses
     - LLMRangeAgent: LiteLLM-powered agent (any model)
+    - ScriptedAgent, ScriptedRedAgent, ScriptedBlueAgent: Fixed-sequence replay agents
     - HumanAgent: Interactive stdin/stdout agent
     - run_episode: Orchestration loop
     - evaluate: Multi-episode evaluation harness
 from open_range.agents.protocol import EpisodeMetrics, EpisodeResult, RangeAgent
 from open_range.agents.parsing import extract_command
+from open_range.agents.replay_agent import (
     ScriptedAgent,
     ScriptedBlueAgent,
     ScriptedRedAgent,

src/open_range/agents/llm_agent.py CHANGED Viewed

@@ -9,6 +9,7 @@ Works with any LiteLLM-supported provider:
 from __future__ import annotations
 from typing import Any, Literal
 from open_range.agents.observation import format_observation
@@ -34,23 +35,34 @@ class LLMRangeAgent:
         model: str = "anthropic/claude-sonnet-4-20250514",
         temperature: float | None = 0.3,
         max_tokens: int = 512,
         **litellm_kwargs: Any,
     ) -> None:
         self.model = model
         self.temperature = temperature
         self.max_tokens = max_tokens
         self.litellm_kwargs = litellm_kwargs
-        self.messages: list[dict[str, str]] = []
         self.role: str = "red"
     def reset(self, briefing: str, role: Literal["red", "blue"]) -> None:
         """Initialize conversation history with role-specific system prompt."""
         self.role = role
         system = RED_SYSTEM_PROMPT if role == "red" else BLUE_SYSTEM_PROMPT
         self.messages = [
             {"role": "system", "content": system},
-            {"role": "user", "content": briefing},
         ]
     def act(self, observation: Any) -> str:
         """Call the LLM with the conversation history and return a command.
@@ -81,5 +93,6 @@ class LLMRangeAgent:
         response = litellm.completion(**kwargs)
         text = response.choices[0].message.content.strip()
         self.messages.append({"role": "assistant", "content": text})
-        return extract_command(text)

 from __future__ import annotations
+import copy
 from typing import Any, Literal
 from open_range.agents.observation import format_observation
         model: str = "anthropic/claude-sonnet-4-20250514",
         temperature: float | None = 0.3,
         max_tokens: int = 512,
+        bootstrap_messages: list[dict[str, Any]] | None = None,
+        system_suffix: str = "",
         **litellm_kwargs: Any,
     ) -> None:
         self.model = model
         self.temperature = temperature
         self.max_tokens = max_tokens
+        self.bootstrap_messages = copy.deepcopy(bootstrap_messages or [])
+        self.system_suffix = system_suffix.strip()
         self.litellm_kwargs = litellm_kwargs
+        self.messages: list[dict[str, Any]] = []
         self.role: str = "red"
+        self.last_response_text: str = ""
+        self.last_command: str = ""
     def reset(self, briefing: str, role: Literal["red", "blue"]) -> None:
         """Initialize conversation history with role-specific system prompt."""
         self.role = role
         system = RED_SYSTEM_PROMPT if role == "red" else BLUE_SYSTEM_PROMPT
+        if self.system_suffix:
+            system = f"{system}\n\n{self.system_suffix}"
         self.messages = [
             {"role": "system", "content": system},
         ]
+        self.messages.extend(copy.deepcopy(self.bootstrap_messages))
+        self.messages.append({"role": "user", "content": briefing})
+        self.last_response_text = ""
+        self.last_command = ""
     def act(self, observation: Any) -> str:
         """Call the LLM with the conversation history and return a command.
         response = litellm.completion(**kwargs)
         text = response.choices[0].message.content.strip()
         self.messages.append({"role": "assistant", "content": text})
+        self.last_response_text = text
+        self.last_command = extract_command(text)
+        return self.last_command

src/open_range/agents/parsing.py CHANGED Viewed

@@ -83,3 +83,43 @@ def extract_command(text: str) -> str:
         return first
     return stripped

         return first
     return stripped
+def strip_command_from_response(text: str, command: str) -> str:
+    """Remove the extracted command from an LLM response, preserving reasoning.
+    This is best-effort. It handles the response patterns encouraged by the
+    synthetic-data prompts:
+    - fenced code blocks
+    - ``Command: ...`` lines
+    - a trailing bare command line
+    """
+    if not text:
+        return ""
+    stripped = text.strip()
+    if not command:
+        return stripped
+    command_pattern = re.escape(command.strip())
+    # Remove fenced blocks that only contain the command.
+    stripped = re.sub(
+        rf"```(?:bash|sh|shell|zsh)?\s*\n\s*{command_pattern}\s*```",
+        "",
+        stripped,
+        flags=re.IGNORECASE | re.DOTALL,
+    ).strip()
+    # Remove explicit "Command:" lines.
+    stripped = re.sub(
+        rf"(?im)^\s*(?:command|run|execute|cmd)\s*:\s*{command_pattern}\s*$",
+        "",
+        stripped,
+    ).strip()
+    # Remove a trailing bare command line.
+    lines = stripped.splitlines()
+    if lines and lines[-1].strip().strip("`") == command.strip():
+        lines = lines[:-1]
+    return "\n".join(lines).strip()

src/open_range/agents/{scripted_agent.py → replay_agent.py} RENAMED Viewed

@@ -1,22 +1,18 @@
-"""Scripted agents for testing and demos.
-No LLM required -- these agents replay a fixed list of commands.
-Useful for integration tests, golden-path verification, and hackathon demos.
 """
 from __future__ import annotations
 from typing import Any, Literal
 class ScriptedAgent:
-    """Replays a fixed list of commands in order.
-    After the list is exhausted, repeats the last command (or a configurable
-    fallback) so the episode can terminate normally.
-    Satisfies the :class:`RangeAgent` protocol.
-    """
     def __init__(
         self,
@@ -27,24 +23,34 @@ class ScriptedAgent:
         self.fallback = fallback
         self._step_idx = 0
         self.role: str = "red"
     def reset(self, briefing: str, role: Literal["red", "blue"]) -> None:
         """Reset the step counter for a new episode."""
         self._step_idx = 0
         self.role = role
     def act(self, observation: Any) -> str:
         """Return the next scripted command."""
         if self._step_idx < len(self.commands):
             cmd = self.commands[self._step_idx]
             self._step_idx += 1
-            return cmd
-        return self.fallback
-# ---------------------------------------------------------------------------
-# Pre-built demo agents
-# ---------------------------------------------------------------------------
 DEMO_RED_SCRIPT = [
     "nmap -sV 10.0.1.0/24",
@@ -76,3 +82,38 @@ class ScriptedBlueAgent(ScriptedAgent):
     def __init__(self) -> None:
         super().__init__(commands=DEMO_BLUE_SCRIPT, fallback="check_services")

+"""Deterministic replay agents for testing, baselines, and demos.
+No LLM is required. These agents replay a fixed list of commands and provide
+lightweight reasoning text so synthetic trajectory export can still emit
+tool-style transcripts.
 """
 from __future__ import annotations
+import shlex
 from typing import Any, Literal
 class ScriptedAgent:
+    """Replays a fixed list of commands in order."""
     def __init__(
         self,
         self.fallback = fallback
         self._step_idx = 0
         self.role: str = "red"
+        self.last_response_text: str = ""
+        self.last_command: str = ""
     def reset(self, briefing: str, role: Literal["red", "blue"]) -> None:
         """Reset the step counter for a new episode."""
+        del briefing
         self._step_idx = 0
         self.role = role
+        self.last_response_text = ""
+        self.last_command = ""
     def act(self, observation: Any) -> str:
         """Return the next scripted command."""
+        del observation
         if self._step_idx < len(self.commands):
             cmd = self.commands[self._step_idx]
             self._step_idx += 1
+        else:
+            cmd = self.fallback
+        self.last_command = cmd
+        self.last_response_text = self._render_response(cmd)
+        return cmd
+    def _render_response(self, command: str) -> str:
+        thought = _default_reasoning(command, role=self.role)
+        return f"<think>\n{thought}\n</think>\nCommand: {command}"
 DEMO_RED_SCRIPT = [
     "nmap -sV 10.0.1.0/24",
     def __init__(self) -> None:
         super().__init__(commands=DEMO_BLUE_SCRIPT, fallback="check_services")
+def _default_reasoning(command: str, *, role: str) -> str:
+    lowered = command.lower()
+    try:
+        parts = shlex.split(command)
+    except ValueError:
+        parts = command.split()
+    first_path = next((part for part in parts[1:] if "/" in part and not part.startswith("http")), "")
+    if "nmap" in lowered:
+        return "I need a quick service inventory before probing any likely attack paths."
+    if "curl" in lowered and ("union" in lowered or "select" in lowered):
+        return "The search endpoint is a good candidate for SQL injection, so I will test a UNION-style payload."
+    if "curl" in lowered:
+        return "I should inspect the exposed web surface to identify routes, parameters, and authentication flows."
+    if "mysql" in lowered:
+        return "I appear to have database access, so I will enumerate data stores and look for the flag-bearing table."
+    if lowered.startswith("cat ") and first_path:
+        return f"I need to inspect {first_path} directly for credentials, source code, or other embedded clues."
+    if lowered.startswith("grep "):
+        if role == "blue":
+            return "I need to filter the SIEM logs for indicators that confirm the suspected attack path."
+        return "I should search the available files for indicators, credentials, or flag material."
+    if lowered.startswith("find "):
+        return "I need a broader file inventory before I decide which artifact to inspect next."
+    if lowered.startswith("submit_flag "):
+        return "The recovered token looks promising, so I will submit it for validation now."
+    if lowered.startswith("submit_finding "):
+        return "The observed activity is strong enough to report as a concrete finding."
+    if lowered.startswith("patch "):
+        return "I have enough evidence to apply a targeted remediation for the vulnerable path."
+    if "check_services" in lowered:
+        return "Before changing anything else, I should confirm the core services are still healthy."
+    return "I will take the next concrete step that reduces uncertainty and moves the objective forward."

src/open_range/agents/solvers.py CHANGED Viewed

@@ -18,7 +18,7 @@ from __future__ import annotations
 from typing import Literal
-from open_range.agents.scripted_agent import ScriptedAgent
 # =====================================================================

 from typing import Literal
+from open_range.agents.replay_agent import ScriptedAgent
 # =====================================================================

src/open_range/builder/npc/actions.py CHANGED Viewed

@@ -20,8 +20,9 @@ logger = logging.getLogger(__name__)
 class NPCActionExecutor:
     """Execute NPC actions inside Docker containers.
-    At init, extracts available pages, shares, DB tables, and users from
-    the snapshot so every action targets real resources in this environment.
     """
     def __init__(self, containers: ContainerSet, snapshot: SnapshotSpec) -> None:
@@ -32,6 +33,8 @@ class NPCActionExecutor:
         self._db_tables = _extract_db_tables(snapshot)
         self._users = _extract_users(snapshot)
         self._domain = snapshot.topology.get("domain", "corp.local")
     # ------------------------------------------------------------------
     # Routine actions (autonomous workday)
@@ -146,9 +149,11 @@ class NPCActionExecutor:
             query = f"SELECT * FROM {table} LIMIT 5"
         else:
             query = "SHOW TABLES"
         await self.containers.exec(
             "db",
-            f'mysql -u app_user -p\'AppUs3r!2024\' -e "{query}" 2>/dev/null || true',
         )
         return _log(persona, "query_db", detail or f"Queried {target or 'database'}", "db:query_log")
@@ -184,7 +189,7 @@ class NPCActionExecutor:
             "web",
             f'curl -s -o /dev/null -A "Mozilla/5.0 ({username})" "{url}"',
         )
-        return _log(persona, "click_link", f"Clicked: {url}", "web:access_log")
     async def _react_email(self, persona: NPCPersona, action: NPCAction) -> dict[str, Any]:
         username = _username_from_persona(persona)
@@ -196,7 +201,7 @@ class NPCActionExecutor:
             f"&& echo 'From: {username}@{self._domain}\\nSubject: Re\\n\\n{body}' "
             f"> /var/mail/{username}/sent_{ts_i}.eml",
         )
-        return _log(persona, action.action, "Replied to message", "mail:spool")
     async def _react_share_creds(self, persona: NPCPersona, action: NPCAction) -> dict[str, Any]:
         username = _username_from_persona(persona)
@@ -216,7 +221,7 @@ class NPCActionExecutor:
             f'echo "[$(date)] CRED-LEAK: {persona.name} shared credentials" '
             f">> /var/log/siem/consolidated/all.log",
         )
-        return _log(persona, "share_credentials", f"{persona.name} leaked credentials", "web+siem")
     async def _react_report(self, persona: NPCPersona, action: NPCAction) -> dict[str, Any]:
         detail = "; ".join(action.side_effects) if action.side_effects else "suspicious activity"
@@ -225,10 +230,10 @@ class NPCActionExecutor:
             f'echo "[$(date)] NPC-REPORT: {persona.name}: {detail}" '
             f">> /var/log/siem/consolidated/all.log",
         )
-        return _log(persona, "report_to_IT", detail, "siem:alert")
     async def _react_ignore(self, persona: NPCPersona, action: NPCAction) -> dict[str, Any]:
-        return _log(persona, "ignore", "Ignored stimulus", "none")
 # ---------------------------------------------------------------------------
@@ -237,17 +242,22 @@ class NPCActionExecutor:
 def _extract_web_pages(snapshot: SnapshotSpec) -> list[str]:
-    """Extract URL paths from snapshot files dict (web:*.php -> /path)."""
     pages: list[str] = []
     for key in snapshot.files:
         if not key.startswith("web:"):
             continue
         path = key.split(":", 1)[1]
-        # Convert filesystem path to URL path
-        if "/var/www/" in path and path.endswith(".php"):
-            url_path = path.replace("/var/www/portal", "").replace("/var/www/html", "")
-            if url_path:
-                pages.append(url_path)
     return pages or ["/"]
@@ -287,6 +297,38 @@ def _extract_users(snapshot: SnapshotSpec) -> list[str]:
     return [u["username"] for u in users if isinstance(u, dict) and "username" in u]
 def _username_from_persona(persona: NPCPersona) -> str:
     email = persona.accounts.get("email", "")
     if "@" in email:
@@ -295,12 +337,36 @@ def _username_from_persona(persona: NPCPersona) -> str:
 def _log(persona: NPCPersona, action: str, detail: str, source: str) -> dict[str, Any]:
     return {
         "timestamp": time.time(),
         "type": f"npc_{action}",
         "persona": persona.name,
         "department": persona.department,
         "action": action,
         "detail": detail,
         "source": source,
     }

 class NPCActionExecutor:
     """Execute NPC actions inside Docker containers.
+    At init, extracts available pages, shares, DB tables, users, and
+    credentials from the snapshot so every action targets real resources
+    in this environment.
     """
     def __init__(self, containers: ContainerSet, snapshot: SnapshotSpec) -> None:
         self._db_tables = _extract_db_tables(snapshot)
         self._users = _extract_users(snapshot)
         self._domain = snapshot.topology.get("domain", "corp.local")
+        self._db_creds = _extract_db_credentials(snapshot)
+        self._ssh_creds = _extract_ssh_credentials(snapshot)
     # ------------------------------------------------------------------
     # Routine actions (autonomous workday)
             query = f"SELECT * FROM {table} LIMIT 5"
         else:
             query = "SHOW TABLES"
+        db_user, db_pass = self._db_creds
+        cred_flag = f"-u {db_user} -p'{db_pass}'" if db_pass else f"-u {db_user}"
         await self.containers.exec(
             "db",
+            f'mysql {cred_flag} -e "{query}" 2>/dev/null || true',
         )
         return _log(persona, "query_db", detail or f"Queried {target or 'database'}", "db:query_log")
             "web",
             f'curl -s -o /dev/null -A "Mozilla/5.0 ({username})" "{url}"',
         )
+        return _se_log(persona, "click_link", f"Clicked: {url}", "web:access_log", result="success")
     async def _react_email(self, persona: NPCPersona, action: NPCAction) -> dict[str, Any]:
         username = _username_from_persona(persona)
             f"&& echo 'From: {username}@{self._domain}\\nSubject: Re\\n\\n{body}' "
             f"> /var/mail/{username}/sent_{ts_i}.eml",
         )
+        return _se_log(persona, action.action, "Replied to message", "mail:spool", result="success")
     async def _react_share_creds(self, persona: NPCPersona, action: NPCAction) -> dict[str, Any]:
         username = _username_from_persona(persona)
             f'echo "[$(date)] CRED-LEAK: {persona.name} shared credentials" '
             f">> /var/log/siem/consolidated/all.log",
         )
+        return _se_log(persona, "share_credentials", f"{persona.name} leaked credentials", "web+siem", result="success")
     async def _react_report(self, persona: NPCPersona, action: NPCAction) -> dict[str, Any]:
         detail = "; ".join(action.side_effects) if action.side_effects else "suspicious activity"
             f'echo "[$(date)] NPC-REPORT: {persona.name}: {detail}" '
             f">> /var/log/siem/consolidated/all.log",
         )
+        return _se_log(persona, "report_to_IT", detail, "siem:alert", result="blocked")
     async def _react_ignore(self, persona: NPCPersona, action: NPCAction) -> dict[str, Any]:
+        return _se_log(persona, "ignore", "Ignored stimulus", "none", result="blocked")
 # ---------------------------------------------------------------------------
 def _extract_web_pages(snapshot: SnapshotSpec) -> list[str]:
+    """Extract URL paths from snapshot files dict (web:*.php -> /path).
+    Handles arbitrary doc roots by stripping any ``/var/www/<app>/`` prefix
+    to produce URL paths.
+    """
     pages: list[str] = []
     for key in snapshot.files:
         if not key.startswith("web:"):
             continue
         path = key.split(":", 1)[1]
+        if not path.endswith((".php", ".html", ".htm")):
+            continue
+        # Strip doc root: /var/www/<anything>/ -> /
+        url_path = re.sub(r"^/var/www/[^/]+", "", path)
+        if url_path:
+            pages.append(url_path)
     return pages or ["/"]
     return [u["username"] for u in users if isinstance(u, dict) and "username" in u]
+def _extract_db_credentials(snapshot: SnapshotSpec) -> tuple[str, str]:
+    """Extract DB credentials from topology users. Fallback to defaults."""
+    users = snapshot.topology.get("users", [])
+    for user in users:
+        if not isinstance(user, dict):
+            continue
+        hosts = user.get("hosts", [])
+        if "db" in hosts:
+            return user.get("username", "app_user"), user.get("password", "")
+    return "app_user", "AppUs3r!2024"
+def _extract_ssh_credentials(snapshot: SnapshotSpec) -> tuple[str, str]:
+    """Extract SSH admin credentials from topology users. Fallback to defaults."""
+    users = snapshot.topology.get("users", [])
+    # First pass: look for explicit admin roles
+    for user in users:
+        if not isinstance(user, dict):
+            continue
+        role = user.get("role", "")
+        if role in ("admin", "sysadmin", "root"):
+            return user.get("username", "admin"), user.get("password", "")
+    # Second pass: look for users on SSH-accessible hosts
+    for user in users:
+        if not isinstance(user, dict):
+            continue
+        hosts = user.get("hosts", [])
+        if any(h in hosts for h in ("web", "files", "ldap", "siem")):
+            return user.get("username", "admin"), user.get("password", "")
+    return "admin", "Adm1n!2024"
 def _username_from_persona(persona: NPCPersona) -> str:
     email = persona.accounts.get("email", "")
     if "@" in email:
 def _log(persona: NPCPersona, action: str, detail: str, source: str) -> dict[str, Any]:
+    """Log a routine (benign) NPC action."""
     return {
         "timestamp": time.time(),
         "type": f"npc_{action}",
+        "label": "benign",
+        "persona": persona.name,
+        "department": persona.department,
+        "action": action,
+        "detail": detail,
+        "source": source,
+    }
+def _se_log(
+    persona: NPCPersona,
+    action: str,
+    detail: str,
+    source: str,
+    *,
+    result: str = "unknown",
+) -> dict[str, Any]:
+    """Log a social-engineering reactive NPC action for reward coupling."""
+    return {
+        "timestamp": time.time(),
+        "type": "social_engineering",
+        "label": "reactive",
         "persona": persona.name,
         "department": persona.department,
         "action": action,
         "detail": detail,
         "source": source,
+        "result": result,
     }

src/open_range/builder/npc/channels.py CHANGED Viewed

@@ -76,11 +76,13 @@ class ChatChannel:
         return [
             {
                 "type": "chat",
                 "sender": m.sender,
                 "recipient": m.recipient,
                 "content": m.content,
                 "timestamp": m.timestamp,
                 "channel": m.channel,
             }
             for m in self._messages
         ]
@@ -183,6 +185,7 @@ class VoiceChannel:
         return [
             {
                 "type": "voice",
                 "caller": c.caller,
                 "callee": c.callee,
                 "pretext": c.pretext,
@@ -190,6 +193,7 @@ class VoiceChannel:
                 "transcript": c.transcript,
                 "timestamp": c.timestamp,
                 "duration_s": c.duration_s,
             }
             for c in self._calls
         ]
@@ -271,6 +275,7 @@ class DocumentChannel:
         return [
             {
                 "type": "document",
                 "sender": d.sender,
                 "recipient": d.recipient,
                 "filename": d.filename,
@@ -278,6 +283,7 @@ class DocumentChannel:
                 "timestamp": d.timestamp,
                 "accessed": d.accessed,
                 "access_decision": d.access_decision,
             }
             for d in self._documents
         ]

         return [
             {
                 "type": "chat",
+                "label": "benign",
                 "sender": m.sender,
                 "recipient": m.recipient,
                 "content": m.content,
                 "timestamp": m.timestamp,
                 "channel": m.channel,
+                "source": f"chat:{m.channel}",
             }
             for m in self._messages
         ]
         return [
             {
                 "type": "voice",
+                "label": "benign",
                 "caller": c.caller,
                 "callee": c.callee,
                 "pretext": c.pretext,
                 "transcript": c.transcript,
                 "timestamp": c.timestamp,
                 "duration_s": c.duration_s,
+                "source": "voice:phone",
             }
             for c in self._calls
         ]
         return [
             {
                 "type": "document",
+                "label": "benign",
                 "sender": d.sender,
                 "recipient": d.recipient,
                 "filename": d.filename,
                 "timestamp": d.timestamp,
                 "accessed": d.accessed,
                 "access_decision": d.access_decision,
+                "source": f"document:{d.filename}",
             }
             for d in self._documents
         ]

src/open_range/builder/npc/db_traffic.sh CHANGED Viewed

@@ -1,8 +1,8 @@
 #!/usr/bin/env bash
 # Level 0 NPC: Database traffic generator (mysql query loop)
 #
-# Simulates application database queries -- SELECT, INSERT operations
-# that a normal web application backend would generate.
 #
 # Environment variables:
 #   DB_HOST    - hostname of the database server (default: db)
@@ -15,29 +15,56 @@ RATE_LAMBDA="${RATE_LAMBDA:-20}"
 INTERVAL=$(awk "BEGIN {printf \"%.1f\", 60.0 / $RATE_LAMBDA}")
-DB_NAME="referral_db"
-# Application-level queries that a normal app would run
-QUERIES=(
-    "SELECT id, first_name, last_name FROM ${DB_NAME}.patients LIMIT 5"
-    "SELECT id, status, specialist FROM ${DB_NAME}.patient_referrals ORDER BY created_at DESC LIMIT 3"
-    "SELECT COUNT(*) FROM ${DB_NAME}.patient_referrals WHERE status='Pending'"
-    "SELECT id, amount_due, status FROM ${DB_NAME}.billing WHERE status='Open'"
-    "SELECT username, role, department FROM ${DB_NAME}.users LIMIT 10"
-    "UPDATE ${DB_NAME}.billing SET last_updated=CURDATE() WHERE id=5001"
-    "SELECT p.first_name, p.last_name, r.status FROM ${DB_NAME}.patients p JOIN ${DB_NAME}.patient_referrals r ON p.id=r.patient_id LIMIT 5"
-    "INSERT INTO ${DB_NAME}.access_log (user_id, action, ip) VALUES (3, 'view_referrals', '10.0.1.10')"
-)
-# App database credentials (non-privileged)
-DB_USER="app_user"
-DB_PASS="AppUs3r!2024"
-echo "[NPC-DB] Starting DB traffic to ${DB_HOST} at ${RATE_LAMBDA} queries/min"
 while true; do
-    IDX=$(( RANDOM % ${#QUERIES[@]} ))
-    QUERY="${QUERIES[$IDX]}"
     mysql -h "${DB_HOST}" \
           -u "${DB_USER}" \

 #!/usr/bin/env bash
 # Level 0 NPC: Database traffic generator (mysql query loop)
 #
+# Simulates application database queries. Discovers available databases
+# and tables dynamically so it adapts to any LLM-generated environment.
 #
 # Environment variables:
 #   DB_HOST    - hostname of the database server (default: db)
 INTERVAL=$(awk "BEGIN {printf \"%.1f\", 60.0 / $RATE_LAMBDA}")
+# App database credentials -- resolved from env or defaults
+DB_USER="${DB_USER:-app_user}"
+DB_PASS="${DB_PASS:-AppUs3r!2024}"
+# Discover databases (skip system DBs)
+discover_db() {
+    mysql -h "${DB_HOST}" -u "${DB_USER}" -p"${DB_PASS}" \
+        -N -e "SHOW DATABASES" 2>/dev/null \
+        | grep -v -E '^(information_schema|mysql|performance_schema|sys)$' \
+        | head -1
+}
+# Discover tables in a database
+discover_tables() {
+    local db="$1"
+    mysql -h "${DB_HOST}" -u "${DB_USER}" -p"${DB_PASS}" \
+        -N -e "SHOW TABLES FROM ${db}" 2>/dev/null \
+        | head -10
+}
+# Wait for DB to be ready
+for i in $(seq 1 10); do
+    DB_NAME=$(discover_db) && [ -n "$DB_NAME" ] && break
+    sleep 3
+done
+if [ -z "${DB_NAME:-}" ]; then
+    echo "[NPC-DB] No application database found, exiting"
+    exit 0
+fi
+# Get available tables
+mapfile -t TABLES < <(discover_tables "$DB_NAME")
+if [ ${#TABLES[@]} -eq 0 ]; then
+    echo "[NPC-DB] No tables found in ${DB_NAME}, exiting"
+    exit 0
+fi
+echo "[NPC-DB] Starting DB traffic to ${DB_HOST}/${DB_NAME} at ${RATE_LAMBDA} queries/min (${#TABLES[@]} tables)"
 while true; do
+    IDX=$(( RANDOM % ${#TABLES[@]} ))
+    TABLE="${TABLES[$IDX]}"
+    # Alternate between safe read queries
+    case $(( RANDOM % 3 )) in
+        0) QUERY="SELECT * FROM ${DB_NAME}.${TABLE} LIMIT 5" ;;
+        1) QUERY="SELECT COUNT(*) FROM ${DB_NAME}.${TABLE}" ;;
+        2) QUERY="DESCRIBE ${DB_NAME}.${TABLE}" ;;
+    esac
     mysql -h "${DB_HOST}" \
           -u "${DB_USER}" \

src/open_range/builder/npc/http_traffic.sh CHANGED Viewed

@@ -2,7 +2,8 @@
 # Level 0 NPC: HTTP traffic generator (curl loop)
 #
 # Generates benign web traffic to simulate normal user browsing.
-# All requests are labeled as NPC traffic in the access log via User-Agent.
 #
 # Environment variables:
 #   WEB_HOST   - hostname of the web server (default: web)
@@ -13,50 +14,45 @@ set -euo pipefail
 WEB_HOST="${WEB_HOST:-web}"
 RATE_LAMBDA="${RATE_LAMBDA:-30}"
-# Calculate sleep interval in seconds
 INTERVAL=$(awk "BEGIN {printf \"%.1f\", 60.0 / $RATE_LAMBDA}")
-# Common pages and endpoints that a normal user would visit
-PAGES=(
-    "/"
-    "/index.php"
-    "/dashboard.php"
-    "/lookup.php?last=Smith"
-    "/lookup.php?last=Hart"
-    "/lookup.php?last=Bishop"
-    "/referral_submit.php"
-    "/api/referral_status.php?id=1001"
-    "/api/referral_status.php?id=1002"
-    "/admin/compliance_report.php"
-    "/logout.php"
-)
-# Common form submissions
-FORMS=(
-    "-d 'username=mgarcia&password=Welcome2024!' http://${WEB_HOST}/index.php"
-    "-d 'username=kwilliams&password=Welcome2024!' http://${WEB_HOST}/index.php"
-    "-d 'patient_id=1&clinic=Northside&specialist=Dr.Patel&diagnosis=Cardiology' http://${WEB_HOST}/referral_submit.php"
-)
-echo "[NPC-HTTP] Starting HTTP traffic to ${WEB_HOST} at ${RATE_LAMBDA} req/min"
 while true; do
-    # 80% GET requests, 20% POST requests
-    if (( RANDOM % 5 == 0 )); then
-        # POST request (form submission)
-        IDX=$(( RANDOM % ${#FORMS[@]} ))
-        FORM="${FORMS[$IDX]}"
-        curl -s -o /dev/null -w '' \
-            -A "NPC-Traffic/1.0 (benign)" \
-            -X POST ${FORM} 2>/dev/null || true
-    else
-        # GET request (page browse)
-        IDX=$(( RANDOM % ${#PAGES[@]} ))
-        PAGE="${PAGES[$IDX]}"
-        curl -s -o /dev/null -w '' \
-            -A "NPC-Traffic/1.0 (benign)" \
-            "http://${WEB_HOST}${PAGE}" 2>/dev/null || true
-    fi
     sleep "${INTERVAL}"
 done

 # Level 0 NPC: HTTP traffic generator (curl loop)
 #
 # Generates benign web traffic to simulate normal user browsing.
+# Discovers available pages dynamically from the web server's document
+# root so it adapts to any LLM-generated environment.
 #
 # Environment variables:
 #   WEB_HOST   - hostname of the web server (default: web)
 WEB_HOST="${WEB_HOST:-web}"
 RATE_LAMBDA="${RATE_LAMBDA:-30}"
 INTERVAL=$(awk "BEGIN {printf \"%.1f\", 60.0 / $RATE_LAMBDA}")
+# Discover available pages from the web root
+discover_pages() {
+    local pages=("/")
+    # Try common doc roots
+    for root in /var/www/html /var/www/portal /var/www; do
+        if [ -d "$root" ]; then
+            while IFS= read -r f; do
+                # Strip doc root to get URL path
+                local url_path="${f#$root}"
+                [ -n "$url_path" ] && pages+=("$url_path")
+            done < <(find "$root" -maxdepth 2 -name '*.php' -o -name '*.html' 2>/dev/null | head -20)
+            break
+        fi
+    done
+    # Fallback: probe common endpoints
+    if [ ${#pages[@]} -le 1 ]; then
+        for p in /index.php /index.html /login.php /dashboard.php; do
+            if curl -s -o /dev/null -w '%{http_code}' "http://${WEB_HOST}${p}" 2>/dev/null | grep -q '^[23]'; then
+                pages+=("$p")
+            fi
+        done
+    fi
+    printf '%s\n' "${pages[@]}"
+}
+# Build page list once at startup
+mapfile -t PAGES < <(discover_pages)
+[ ${#PAGES[@]} -eq 0 ] && PAGES=("/")
+echo "[NPC-HTTP] Starting HTTP traffic to ${WEB_HOST} at ${RATE_LAMBDA} req/min (${#PAGES[@]} pages)"
 while true; do
+    IDX=$(( RANDOM % ${#PAGES[@]} ))
+    PAGE="${PAGES[$IDX]}"
+    curl -s -o /dev/null -w '' \
+        -A "NPC-Traffic/1.0 (benign)" \
+        "http://${WEB_HOST}${PAGE}" 2>/dev/null || true
     sleep "${INTERVAL}"
 done

src/open_range/builder/npc/npc_agent.py CHANGED Viewed

@@ -228,26 +228,43 @@ class LLMNPCAgent:
                 self._actions.append(log_entry)
                 logger.debug("NPC %s: %s", persona.name, log_entry.get("detail", ""))
-                # --- Phase 2: Check mailbox ---
                 try:
                     mail_output = await containers.exec(
                         "mail",
-                        f"find /var/mail/{mail_user} "
                         f"-newer /tmp/.npc_check_{mail_user} "
-                        f"-type f 2>/dev/null | head -1",
                     )
                     await containers.exec("mail", f"touch /tmp/.npc_check_{mail_user}")
                     if mail_output and mail_output.strip():
-                        email_file = mail_output.strip().split("\n")[0]
-                        content = await containers.exec(
-                            "mail", f"head -50 '{email_file}' 2>/dev/null || true",
-                        )
-                        if content and content.strip():
                             stimulus = Stimulus(
-                                type="email", sender="unknown",
-                                subject="Incoming message",
                                 content=content[:500],
                             )
                             react = await self.decide(persona, stimulus)
                             react_log = await executor.execute(persona, react)

                 self._actions.append(log_entry)
                 logger.debug("NPC %s: %s", persona.name, log_entry.get("detail", ""))
+                # --- Phase 2: Check mailbox for incoming stimuli ---
+                # Red may send real phishing emails via SMTP. Check multiple
+                # mail spool locations for new messages.
                 try:
                     mail_output = await containers.exec(
                         "mail",
+                        f"{{ find /var/spool/mail/ /var/mail/ "
+                        f"/home/{mail_user}/Maildir/new/ "
                         f"-newer /tmp/.npc_check_{mail_user} "
+                        f"-type f 2>/dev/null || true; }} | head -3",
                     )
                     await containers.exec("mail", f"touch /tmp/.npc_check_{mail_user}")
                     if mail_output and mail_output.strip():
+                        for email_file in mail_output.strip().split("\n")[:3]:
+                            email_file = email_file.strip()
+                            if not email_file:
+                                continue
+                            content = await containers.exec(
+                                "mail", f"head -50 '{email_file}' 2>/dev/null || true",
+                            )
+                            if not content or not content.strip():
+                                continue
+                            # Extract sender from email headers
+                            sender = "unknown"
+                            subject = "Incoming message"
+                            for line in content.split("\n")[:20]:
+                                if line.lower().startswith("from:"):
+                                    sender = line.split(":", 1)[1].strip()
+                                elif line.lower().startswith("subject:"):
+                                    subject = line.split(":", 1)[1].strip()
                             stimulus = Stimulus(
+                                type="email",
+                                sender=sender,
+                                subject=subject,
                                 content=content[:500],
+                                plausibility=0.7,
                             )
                             react = await self.decide(persona, stimulus)
                             react_log = await executor.execute(persona, react)

src/open_range/builder/npc/npc_manager.py CHANGED Viewed

@@ -89,10 +89,11 @@ def _container_for_script(script_name: str, topology: dict[str, Any]) -> str:
 def _resolve_env_vars(topology: dict[str, Any], rate_lambda: float) -> dict[str, str]:
-    """Build environment variables by resolving roles from the topology.
-    Instead of hardcoding ``WEB_HOST=web``, this finds the host whose
-    services list contains web/nginx/etc and maps the role to its name.
     """
     hosts = _hosts_from_topology(topology)
     env: dict[str, str] = {"RATE_LAMBDA": str(int(rate_lambda))}
@@ -103,6 +104,21 @@ def _resolve_env_vars(topology: dict[str, Any], rate_lambda: float) -> dict[str,
                 env[role] = host["name"]
                 break
     return env
@@ -127,10 +143,20 @@ def _derive_scripts_from_topology(topology: dict[str, Any]) -> list[str]:
 class NPCManager:
-    """Start and stop NPC background traffic for a snapshot."""
-    def __init__(self, mock_mode: bool = False) -> None:
         self._mock_mode = mock_mode
         self._processes: list[asyncio.subprocess.Process] = []
         self._tasks: list[asyncio.Task[Any]] = []
         self._running = False
@@ -261,9 +287,9 @@ class NPCManager:
             from open_range.builder.npc.npc_agent import LLMNPCAgent
             for persona in snapshot.npc_personas:
-                agent = LLMNPCAgent()
                 task = asyncio.create_task(
-                    agent.run_loop(persona, containers),
                     name=f"npc_{persona.name}",
                 )
                 self._tasks.append(task)
@@ -354,8 +380,6 @@ class NPCManager:
         self._running = True
         self._containers = containers
-        npc_cfg = snapshot.npc_traffic
         # Re-initialise channels for the new episode
         self.channels = {
             "chat": ChatChannel(),

 def _resolve_env_vars(topology: dict[str, Any], rate_lambda: float) -> dict[str, str]:
+    """Build environment variables by resolving roles and credentials from topology.
+    Resolves host roles (WEB_HOST, DB_HOST, etc.) and credentials (DB_USER,
+    DB_PASS, SSH_USER, SSH_PASS) from the topology so shell scripts don't
+    need hardcoded values.
     """
     hosts = _hosts_from_topology(topology)
     env: dict[str, str] = {"RATE_LAMBDA": str(int(rate_lambda))}
                 env[role] = host["name"]
                 break
+    # Pass DB and SSH credentials from topology to shell scripts
+    users = topology.get("users", [])
+    for user in users:
+        if not isinstance(user, dict):
+            continue
+        hosts_list = user.get("hosts", [])
+        if "db" in hosts_list and "DB_USER" not in env:
+            env["DB_USER"] = user.get("username", "app_user")
+            env["DB_PASS"] = user.get("password", "AppUs3r!2024")
+        if any(h in hosts_list for h in ("web", "files", "ldap", "siem")):
+            role = user.get("role", "")
+            if role in ("admin", "sysadmin", "root") and "SSH_USER" not in env:
+                env["SSH_USER"] = user.get("username", "admin")
+                env["SSH_PASS"] = user.get("password", "Adm1n!2024")
     return env
 class NPCManager:
+    """Start and stop NPC background traffic for a snapshot.
+    Args:
+        mock_mode: When True, skip Docker exec and LLM calls (unit tests).
+        model: LiteLLM model string for Level 1 NPC agents.
+            Defaults to ``OPENRANGE_NPC_MODEL`` env var, then
+            ``azure/gpt-5.2-codex``.  Any LiteLLM-supported model works
+            (e.g. ``openai/gpt-4o``, ``anthropic/claude-haiku-4-5-20251001``,
+            ``ollama/llama3``).
+    """
+    def __init__(self, mock_mode: bool = False, model: str | None = None) -> None:
         self._mock_mode = mock_mode
+        self._model = model  # passed to LLMNPCAgent
         self._processes: list[asyncio.subprocess.Process] = []
         self._tasks: list[asyncio.Task[Any]] = []
         self._running = False
             from open_range.builder.npc.npc_agent import LLMNPCAgent
             for persona in snapshot.npc_personas:
+                agent = LLMNPCAgent(model=self._model)
                 task = asyncio.create_task(
+                    agent.run_loop(persona, containers, snapshot),
                     name=f"npc_{persona.name}",
                 )
                 self._tasks.append(task)
         self._running = True
         self._containers = containers
         # Re-initialise channels for the new episode
         self.channels = {
             "chat": ChatChannel(),

src/open_range/builder/npc/persona.py CHANGED Viewed

@@ -12,11 +12,15 @@ from open_range.protocols import NPCPersona
 __all__ = ["NPCPersona", "default_personas"]
-def default_personas() -> list[NPCPersona]:
     """Return a default set of NPC personas for testing.
     Two personas with contrasting security awareness levels:
     a low-awareness marketing employee and a high-awareness CISO.
     """
     return [
         NPCPersona(
@@ -41,7 +45,7 @@ def default_personas() -> list[NPCPersona]:
                 ],
             },
             accounts={
-                "email": "jsmith@acmecorp.local",
                 "ldap": "jsmith",
                 "smb_shares": "marketing,shared",
             },
@@ -70,7 +74,7 @@ def default_personas() -> list[NPCPersona]:
                 ],
             },
             accounts={
-                "email": "dchen@acmecorp.local",
                 "ldap": "dchen",
                 "smb_shares": "security,executive",
             },

 __all__ = ["NPCPersona", "default_personas"]
+def default_personas(domain: str = "corp.local") -> list[NPCPersona]:
     """Return a default set of NPC personas for testing.
     Two personas with contrasting security awareness levels:
     a low-awareness marketing employee and a high-awareness CISO.
+    Args:
+        domain: Email domain to use. Derived from snapshot topology at
+                runtime so personas match the generated environment.
     """
     return [
         NPCPersona(
                 ],
             },
             accounts={
+                "email": f"jsmith@{domain}",
                 "ldap": "jsmith",
                 "smb_shares": "marketing,shared",
             },
                 ],
             },
             accounts={
+                "email": f"dchen@{domain}",
                 "ldap": "dchen",
                 "smb_shares": "security,executive",
             },

src/open_range/builder/npc/ssh_traffic.sh CHANGED Viewed

@@ -31,9 +31,9 @@ COMMANDS=(
     "w"
 )
-# Credentials for benign SSH sessions
-SSH_USER="admin"
-SSH_PASS="Adm1n!2024"
 HOSTS=("${WEB_HOST}" "${DB_HOST}")

     "w"
 )
+# Credentials for benign SSH sessions -- resolved from env or defaults
+SSH_USER="${SSH_USER:-admin}"
+SSH_PASS="${SSH_PASS:-Adm1n!2024}"
 HOSTS=("${WEB_HOST}" "${DB_HOST}")

src/open_range/builder/renderer.py CHANGED Viewed

@@ -15,6 +15,7 @@ from typing import Any
 import jinja2
 from open_range.protocols import SnapshotSpec
 logger = logging.getLogger(__name__)
@@ -81,13 +82,42 @@ class SnapshotRenderer:
             encoding="utf-8",
         )
         logger.info("Rendered %d payload artifact(s) -> %s", len(payload_manifest), manifest_path)
         logger.info(
-            "SnapshotRenderer: rendering complete (%d templates, %d payloads)",
             len(_TEMPLATE_MAP),
             len(payload_manifest),
         )
         return output_dir
     def _render_payloads(self, spec: SnapshotSpec, output_dir: Path) -> dict[str, str]:
         payload_manifest: dict[str, str] = {}
         for key, content in spec.files.items():
@@ -176,6 +206,9 @@ def _build_context(spec: SnapshotSpec) -> dict[str, Any]:
         has_download,
     )
     context: dict[str, Any] = {
         # docker-compose.yml.j2
         "snapshot_id": topology.get("snapshot_id", "generated"),
@@ -183,12 +216,15 @@ def _build_context(spec: SnapshotSpec) -> dict[str, Any]:
         "hosts": hosts,
         "host_names": host_names,
         "db_host": "db",
-        "db_user": _find_db_user(users),
-        "db_pass": _find_db_pass(users),
         "mysql_root_password": topology.get("mysql_root_password", _find_mysql_root_pass(users)),
-        "domain": topology.get("domain", "acmecorp.local"),
-        "org_name": topology.get("org_name", "AcmeCorp"),
-        "ldap_admin_pass": "LdapAdm1n!",
         # Dockerfile.web.j2
         "users": users,
         "app_files": app_files,
@@ -304,3 +340,17 @@ def _find_mysql_root_pass(users: list[dict[str, Any]]) -> str:
         if u.get("username") == "admin" and "db" in u.get("hosts", []):
             return u.get("password", "r00tP@ss!")
     return "r00tP@ss!"

 import jinja2
+from open_range.builder.service_manifest import generate_service_specs
 from open_range.protocols import SnapshotSpec
 logger = logging.getLogger(__name__)
             encoding="utf-8",
         )
         logger.info("Rendered %d payload artifact(s) -> %s", len(payload_manifest), manifest_path)
+        # Generate ServiceSpec entries from compose + topology
+        self._build_service_specs(spec)
         logger.info(
+            "SnapshotRenderer: rendering complete (%d templates, %d payloads, %d services)",
             len(_TEMPLATE_MAP),
             len(payload_manifest),
+            len(spec.services),
         )
         return output_dir
+    def _build_service_specs(self, spec: SnapshotSpec) -> None:
+        """Populate ``spec.services`` from compose and topology.
+        Delegates to :func:`generate_service_specs` which maps Docker
+        image names (or topology host names) to subprocess-mode daemon
+        lifecycle declarations.  Only runs if the spec does not already
+        have services declared (idempotent).
+        """
+        if spec.services:
+            logger.debug("ServiceSpec entries already present — skipping generation")
+            return
+        svc_specs = generate_service_specs(
+            compose=spec.compose,
+            topology=spec.topology,
+        )
+        spec.services = svc_specs
+        if svc_specs:
+            logger.info(
+                "Generated %d ServiceSpec entries: %s",
+                len(svc_specs),
+                [s.daemon for s in svc_specs],
+            )
     def _render_payloads(self, spec: SnapshotSpec, output_dir: Path) -> dict[str, str]:
         payload_manifest: dict[str, str] = {}
         for key, content in spec.files.items():
         has_download,
     )
+    db_user = _find_db_user(users)
+    db_pass = _find_db_pass(users)
     context: dict[str, Any] = {
         # docker-compose.yml.j2
         "snapshot_id": topology.get("snapshot_id", "generated"),
         "hosts": hosts,
         "host_names": host_names,
         "db_host": "db",
+        "db_user": db_user,
+        "db_pass": db_pass,
+        "db_name": topology.get("db_name", "app_db"),
+        "db_password": db_pass,
         "mysql_root_password": topology.get("mysql_root_password", _find_mysql_root_pass(users)),
+        "domain": topology.get("domain", "corp.local"),
+        "org_name": topology.get("org_name", "Corp"),
+        "ldap_admin_pass": topology.get("ldap_admin_pass", "LdapAdm1n!"),
+        "smb_shares": _find_smb_shares(spec),
         # Dockerfile.web.j2
         "users": users,
         "app_files": app_files,
         if u.get("username") == "admin" and "db" in u.get("hosts", []):
             return u.get("password", "r00tP@ss!")
     return "r00tP@ss!"
+def _find_smb_shares(spec: SnapshotSpec) -> list[str]:
+    """Extract Samba share names from snapshot files dict."""
+    shares: set[str] = set()
+    for key in spec.files:
+        if not key.startswith("files:"):
+            continue
+        path = key.split(":", 1)[1]
+        if "/srv/shares/" in path:
+            parts = path.split("/srv/shares/")[1].split("/")
+            if parts:
+                shares.add(parts[0])
+    return sorted(shares) or ["general"]

src/open_range/builder/service_manifest.py ADDED Viewed

	@@ -0,0 +1,395 @@

+"""Generate ServiceSpec entries from Docker Compose and topology definitions.
+Translates Docker Compose service definitions into subprocess-mode daemon
+lifecycle declarations.  The primary consumer is ``SnapshotRenderer`` which
+stores the generated list in ``SnapshotSpec.services`` so that
+``RangeEnvironment._start_snapshot_services()`` can start the correct daemons
+at episode reset time without relying on a hardcoded host-to-service map.
+The ``_IMAGE_SERVICE_HINTS`` mapping is intentionally a *hint* table, not a
+hard requirement.  Unknown images are skipped with a warning rather than
+raising an error — this keeps the system forward-compatible with new services
+that haven't been catalogued yet.
+"""
+from __future__ import annotations
+import logging
+from typing import Any
+from open_range.protocols import ReadinessCheck, ServiceSpec
+logger = logging.getLogger(__name__)
+# ---------------------------------------------------------------------------
+# Image hint table
+# ---------------------------------------------------------------------------
+# Maps Docker image name prefixes to a tuple of:
+#   (daemon_name, packages, init_commands, start_command, readiness)
+#
+# Values are *templates* — callers may override port, log_dir, env_vars.
+# The start_command may contain ``{log_dir}`` which is interpolated at
+# generation time.
+_ImageHint = tuple[
+    str,               # daemon
+    list[str],         # packages
+    list[str],         # init_commands
+    str,               # start_command
+    ReadinessCheck,    # readiness
+]
+_IMAGE_SERVICE_HINTS: dict[str, _ImageHint] = {
+    # ── Web ──────────────────────────────────────────────────────────
+    "nginx": (
+        "nginx",
+        ["nginx"],
+        ["mkdir -p /var/log/nginx"],
+        "nginx -g 'daemon off;' > {log_dir}/nginx.log 2>&1 &",
+        ReadinessCheck(type="tcp", port=80, timeout_s=10),
+    ),
+    # ── Databases ────────────────────────────────────────────────────
+    "mysql": (
+        "mysqld",
+        ["default-mysql-server", "default-mysql-client"],
+        [
+            "mkdir -p /var/run/mysqld && chown mysql:mysql /var/run/mysqld 2>/dev/null || true",
+            "mkdir -p /var/log/mysql && chown mysql:mysql /var/log/mysql 2>/dev/null || true",
+        ],
+        "mysqld --user=mysql --log-error={log_dir}/mysql.log &",
+        ReadinessCheck(type="command", command="mysqladmin ping --silent 2>/dev/null || mariadb-admin ping --silent 2>/dev/null", timeout_s=30),
+    ),
+    "mariadb": (
+        "mariadbd",
+        ["default-mysql-server", "default-mysql-client"],
+        [
+            "mkdir -p /var/run/mysqld && chown mysql:mysql /var/run/mysqld 2>/dev/null || true",
+            "mkdir -p /var/log/mysql && chown mysql:mysql /var/log/mysql 2>/dev/null || true",
+        ],
+        "mariadbd --user=mysql --log-error={log_dir}/mysql.log &",
+        ReadinessCheck(type="command", command="mariadb-admin ping --silent 2>/dev/null || mysqladmin ping --silent 2>/dev/null", timeout_s=30),
+    ),
+    "postgres": (
+        "postgres",
+        ["postgresql"],
+        [
+            "mkdir -p /var/run/postgresql && chown postgres:postgres /var/run/postgresql 2>/dev/null || true",
+        ],
+        "su - postgres -c 'pg_ctl start -D /var/lib/postgresql/data -l {log_dir}/postgres.log' &",
+        ReadinessCheck(type="tcp", port=5432, timeout_s=30),
+    ),
+    # ── Directory ────────────────────────────────────────────────────
+    "openldap": (
+        "slapd",
+        ["slapd", "ldap-utils"],
+        ["mkdir -p /var/run/slapd"],
+        "slapd -h 'ldap:/// ldapi:///' -u openldap -g openldap > {log_dir}/slapd.log 2>&1 &",
+        ReadinessCheck(type="command", command="ldapsearch -x -H ldap://localhost -b '' -s base namingContexts >/dev/null 2>&1", timeout_s=10),
+    ),
+    "osixia/openldap": (
+        "slapd",
+        ["slapd", "ldap-utils"],
+        ["mkdir -p /var/run/slapd"],
+        "slapd -h 'ldap:/// ldapi:///' -u openldap -g openldap > {log_dir}/slapd.log 2>&1 &",
+        ReadinessCheck(type="command", command="ldapsearch -x -H ldap://localhost -b '' -s base namingContexts >/dev/null 2>&1", timeout_s=10),
+    ),
+    # ── Logging ──────────────────────────────────────────────────────
+    "rsyslog": (
+        "rsyslogd",
+        ["rsyslog"],
+        [],
+        "rsyslogd -n > {log_dir}/rsyslog.log 2>&1 &",
+        ReadinessCheck(type="command", command="pgrep -x rsyslogd", timeout_s=5),
+    ),
+    # ── File sharing ───────────────────────────��─────────────────────
+    "samba": (
+        "smbd",
+        ["samba"],
+        ["mkdir -p /var/lib/samba/private"],
+        "smbd --foreground --no-process-group > {log_dir}/smbd.log 2>&1 &",
+        ReadinessCheck(type="tcp", port=445, timeout_s=10),
+    ),
+    # ── Mail ─────────────────────────────────────────────────────────
+    "postfix": (
+        "master",
+        ["postfix"],
+        [],
+        "postfix start > {log_dir}/postfix.log 2>&1 || true",
+        ReadinessCheck(type="tcp", port=25, timeout_s=10),
+    ),
+    # ── Cache ────────────────────────────────────────────────────────
+    "redis": (
+        "redis-server",
+        ["redis-server"],
+        [],
+        "redis-server --daemonize yes --logfile {log_dir}/redis.log",
+        ReadinessCheck(type="tcp", port=6379, timeout_s=10),
+    ),
+    # ── CI/CD ────────────────────────────────────────────────────────
+    "jenkins": (
+        "java",
+        ["default-jdk"],
+        [],
+        "java -jar /usr/share/jenkins/jenkins.war --httpPort=8080 > {log_dir}/jenkins.log 2>&1 &",
+        ReadinessCheck(type="http", url="http://localhost:8080/login", timeout_s=60),
+    ),
+    # ── Monitoring ───────────────────────────────────────────────────
+    "prometheus": (
+        "prometheus",
+        ["prometheus"],
+        [],
+        "prometheus --config.file=/etc/prometheus/prometheus.yml --web.listen-address=:9090 > {log_dir}/prometheus.log 2>&1 &",
+        ReadinessCheck(type="http", url="http://localhost:9090/-/ready", timeout_s=15),
+    ),
+    "grafana": (
+        "grafana-server",
+        ["grafana"],
+        [],
+        "grafana-server --homepath=/usr/share/grafana > {log_dir}/grafana.log 2>&1 &",
+        ReadinessCheck(type="http", url="http://localhost:3000/api/health", timeout_s=15),
+    ),
+    # ── Remote access ────────────────────────────────────────────────
+    "openssh": (
+        "sshd",
+        ["openssh-server"],
+        ["mkdir -p /var/run/sshd"],
+        "/usr/sbin/sshd -E {log_dir}/sshd.log",
+        ReadinessCheck(type="tcp", port=22, timeout_s=5),
+    ),
+    "linuxserver/openssh-server": (
+        "sshd",
+        ["openssh-server"],
+        ["mkdir -p /var/run/sshd"],
+        "/usr/sbin/sshd -E {log_dir}/sshd.log",
+        ReadinessCheck(type="tcp", port=22, timeout_s=5),
+    ),
+}
+# ---------------------------------------------------------------------------
+# Topology host-name hints (fallback when compose services are absent)
+# ---------------------------------------------------------------------------
+# Maps logical host names commonly used in manifests to the same hint keys.
+_HOST_NAME_HINTS: dict[str, str] = {
+    "web": "nginx",
+    "db": "mysql",
+    "ldap": "openldap",
+    "siem": "rsyslog",
+    "files": "samba",
+    "mail": "postfix",
+    "firewall": "rsyslog",
+    "cache": "redis",
+    "redis": "redis",
+    "ci_cd": "jenkins",
+    "ci": "jenkins",
+    "monitoring": "prometheus",
+    "ssh": "openssh",
+}
+# Default log directory used when none is specified.
+_DEFAULT_LOG_DIR = "/var/log/siem"
+# ---------------------------------------------------------------------------
+# Public API
+# ---------------------------------------------------------------------------
+def generate_service_specs(
+    compose: dict[str, Any],
+    topology: dict[str, Any],
+) -> list[ServiceSpec]:
+    """Generate ServiceSpec entries from compose and topology.
+    Translates Docker Compose service definitions into subprocess-mode
+    daemon lifecycle declarations.
+    The function examines ``compose["services"]`` first.  For each service
+    whose image matches a known hint, a ``ServiceSpec`` is produced.  If
+    the compose dict is empty or missing, the function falls back to the
+    topology host list using ``_HOST_NAME_HINTS``.
+    Services that cannot be mapped (e.g. custom images with no hint) are
+    skipped with a debug-level log message.
+    Parameters
+    ----------
+    compose:
+        Parsed docker-compose dict (may be empty).
+    topology:
+        Parsed topology dict from the manifest / snapshot.
+    Returns
+    -------
+    list[ServiceSpec]
+        One entry per recognised service.  Order follows the compose
+        services dict (or the topology hosts list as fallback).
+    """
+    specs: list[ServiceSpec] = []
+    seen_daemons: set[str] = set()
+    services = compose.get("services", {}) if compose else {}
+    if services:
+        specs = _from_compose(services, seen_daemons)
+    else:
+        specs = _from_topology(topology, seen_daemons)
+    return specs
+# ---------------------------------------------------------------------------
+# Internal helpers
+# ---------------------------------------------------------------------------
+def _match_image_hint(image: str) -> _ImageHint | None:
+    """Match a Docker image string to the closest hint entry.
+    Strips tags (``mysql:8.0`` -> ``mysql``), handles namespaced images
+    (``osixia/openldap:1.5`` -> ``osixia/openldap``), and falls back to
+    substring matching on the image basename.
+    """
+    if not image:
+        return None
+    # Remove tag
+    base = image.split(":")[0].strip()
+    # Exact match (with or without namespace)
+    if base in _IMAGE_SERVICE_HINTS:
+        return _IMAGE_SERVICE_HINTS[base]
+    # Try basename only (e.g. ``bitnami/redis`` -> ``redis``)
+    basename = base.rsplit("/", 1)[-1]
+    if basename in _IMAGE_SERVICE_HINTS:
+        return _IMAGE_SERVICE_HINTS[basename]
+    # Substring match as last resort (e.g. ``mysql/mysql-server`` -> ``mysql``)
+    for key, hint in _IMAGE_SERVICE_HINTS.items():
+        if "/" not in key and key in basename:
+            return hint
+    return None
+def _env_from_compose_service(svc_def: dict[str, Any]) -> dict[str, str]:
+    """Extract environment variables from a compose service definition.
+    Handles both the ``list`` form (``- KEY=VALUE``) and the ``dict`` form.
+    """
+    raw = svc_def.get("environment", {})
+    if isinstance(raw, list):
+        env: dict[str, str] = {}
+        for entry in raw:
+            if "=" in entry:
+                k, v = entry.split("=", 1)
+                env[k] = v
+        return env
+    if isinstance(raw, dict):
+        return {str(k): str(v) for k, v in raw.items()}
+    return {}
+def _build_service_spec(
+    host: str,
+    hint: _ImageHint,
+    log_dir: str = _DEFAULT_LOG_DIR,
+    env_vars: dict[str, str] | None = None,
+) -> ServiceSpec:
+    """Build a ServiceSpec from a matched hint tuple."""
+    daemon, packages, init_commands, start_command, readiness = hint
+    return ServiceSpec(
+        host=host,
+        daemon=daemon,
+        packages=list(packages),
+        init_commands=list(init_commands),
+        start_command=start_command.format(log_dir=log_dir),
+        readiness=readiness.model_copy(),
+        log_dir=log_dir,
+        env_vars=env_vars or {},
+    )
+def _from_compose(
+    services: dict[str, Any],
+    seen_daemons: set[str],
+) -> list[ServiceSpec]:
+    """Generate specs from the compose services section."""
+    specs: list[ServiceSpec] = []
+    for svc_name, svc_def in services.items():
+        if not isinstance(svc_def, dict):
+            continue
+        image = svc_def.get("image", "")
+        hint = _match_image_hint(image)
+        # If no image, try matching the service name itself
+        if hint is None and svc_name in _HOST_NAME_HINTS:
+            fallback_key = _HOST_NAME_HINTS[svc_name]
+            hint = _IMAGE_SERVICE_HINTS.get(fallback_key)
+        if hint is None:
+            logger.debug(
+                "No service hint for compose service %r (image=%r) — skipping",
+                svc_name,
+                image,
+            )
+            continue
+        daemon = hint[0]
+        if daemon in seen_daemons:
+            continue
+        seen_daemons.add(daemon)
+        env_vars = _env_from_compose_service(svc_def)
+        spec = _build_service_spec(
+            host=svc_name,
+            hint=hint,
+            env_vars=env_vars,
+        )
+        specs.append(spec)
+    return specs
+def _from_topology(
+    topology: dict[str, Any],
+    seen_daemons: set[str],
+) -> list[ServiceSpec]:
+    """Generate specs from the topology hosts list (fallback path)."""
+    specs: list[ServiceSpec] = []
+    hosts = topology.get("hosts", [])
+    for host_entry in hosts:
+        host_name = host_entry if isinstance(host_entry, str) else host_entry.get("name", "")
+        if not host_name:
+            continue
+        hint_key = _HOST_NAME_HINTS.get(host_name)
+        if hint_key is None:
+            continue
+        hint = _IMAGE_SERVICE_HINTS.get(hint_key)
+        if hint is None:
+            continue
+        daemon = hint[0]
+        if daemon in seen_daemons:
+            continue
+        seen_daemons.add(daemon)
+        spec = _build_service_spec(host=host_name, hint=hint)
+        specs.append(spec)
+    return specs

src/open_range/builder/templates/docker-compose.yml.j2 CHANGED Viewed

@@ -110,7 +110,7 @@ services:
   mail:
     image: namshi/smtp:latest
     environment:
-      - MAILNAME={{ domain | default('meridianhealth.local') }}
     volumes:
       - shared_logs:/var/log/mail
     networks:
@@ -125,9 +125,9 @@ services:
     command: --default-authentication-plugin=mysql_native_password
     environment:
       - MYSQL_ROOT_PASSWORD={{ mysql_root_password | default('r00tP@ss!') }}
-      - MYSQL_DATABASE=referral_db
-      - MYSQL_USER=app_user
-      - MYSQL_PASSWORD=AppUs3r!2024
     volumes:
       - db_data:/var/lib/mysql
       - shared_logs:/var/log/mysql
@@ -144,11 +144,10 @@ services:
   files:
     image: dperson/samba:latest
     environment:
-      - USER=smbuser;smbP@ss!
-      - SHARE=general;/srv/shares/general;yes;no;no;smbuser
-      - SHARE2=hr;/srv/shares/hr;yes;no;no;smbuser
-      - SHARE3=compliance;/srv/shares/compliance;yes;no;no;smbuser
-      - SHARE4=contracts;/srv/shares/contracts;yes;no;no;smbuser
     volumes:
       - shared_logs:/var/log/samba
     networks:
@@ -159,8 +158,8 @@ services:
   ldap:
     image: osixia/openldap:latest
     environment:
-      - LDAP_ORGANISATION={{ org_name | default('MeridianHealth') }}
-      - LDAP_DOMAIN={{ domain | default('meridianhealth.local') }}
       - LDAP_ADMIN_PASSWORD={{ ldap_admin_pass | default('LdapAdm1n!') }}
     volumes:
       - shared_logs:/var/log/ldap

   mail:
     image: namshi/smtp:latest
     environment:
+      - MAILNAME={{ domain | default('corp.local') }}
     volumes:
       - shared_logs:/var/log/mail
     networks:
     command: --default-authentication-plugin=mysql_native_password
     environment:
       - MYSQL_ROOT_PASSWORD={{ mysql_root_password | default('r00tP@ss!') }}
+      - MYSQL_DATABASE={{ db_name | default('app_db') }}
+      - MYSQL_USER={{ db_user | default('app_user') }}
+      - MYSQL_PASSWORD={{ db_password | default('AppUs3r!2024') }}
     volumes:
       - db_data:/var/lib/mysql
       - shared_logs:/var/log/mysql
   files:
     image: dperson/samba:latest
     environment:
+      - USER={{ smb_user | default('smbuser') }};{{ smb_password | default('smbP@ss!') }}
+{%- for share in smb_shares | default(['general', 'hr', 'compliance', 'contracts']) %}
+      - SHARE{{ loop.index if loop.index > 1 else '' }}={{ share }};/srv/shares/{{ share }};yes;no;no;{{ smb_user | default('smbuser') }}
+{%- endfor %}
     volumes:
       - shared_logs:/var/log/samba
     networks:
   ldap:
     image: osixia/openldap:latest
     environment:
+      - LDAP_ORGANISATION={{ org_name | default('Corp') }}
+      - LDAP_DOMAIN={{ domain | default('corp.local') }}
       - LDAP_ADMIN_PASSWORD={{ ldap_admin_pass | default('LdapAdm1n!') }}
     volumes:
       - shared_logs:/var/log/ldap

src/open_range/cli.py CHANGED Viewed

@@ -220,6 +220,10 @@ def build(
 @click.option("--teacher-model", default=None, help="LiteLLM teacher model. If omitted, selected roles use scripted agents.")
 @click.option("--red-model", default=None, help="Override model for Red teacher.")
 @click.option("--blue-model", default=None, help="Override model for Blue teacher.")
 @click.option("--temperature", default=0.2, type=float, help="Teacher sampling temperature.")
 @click.option("--max-tokens", default=512, type=int, help="Maximum completion tokens per teacher action.")
 @click.option("--template-only/--llm-builder", default=True, help="When using --manifest, build snapshots deterministically instead of via LLM.")
@@ -238,6 +242,10 @@ def synthetic_data(
     teacher_model: str | None,
     red_model: str | None,
     blue_model: str | None,
     temperature: float,
     max_tokens: int,
     template_only: bool,
@@ -249,6 +257,13 @@ def synthetic_data(
         SyntheticTraceGenerator,
         build_teacher_agents,
     )
     if bool(manifest) == bool(snapshot):
         click.echo("Error: provide exactly one of --manifest or --snapshot.", err=True)
@@ -259,11 +274,25 @@ def synthetic_data(
         teacher_model
         or os.environ.get("OPENRANGE_SYNTH_MODEL")
     )
     red_agent, blue_agent = build_teacher_agents(
         teacher_model=resolved_teacher_model,
         roles=selected_roles,
         red_model=red_model,
         blue_model=blue_model,
         temperature=temperature,
         max_tokens=max_tokens,
     )
@@ -274,6 +303,7 @@ def synthetic_data(
             snapshot=_load_snapshot(snapshot),
             red_agent=red_agent,
             blue_agent=blue_agent,
             tier=tier,
             max_steps=max_steps,
             randomize_flags=randomize_flags,
@@ -284,6 +314,7 @@ def synthetic_data(
             _load_manifest(str(manifest)),
             red_agent=red_agent,
             blue_agent=blue_agent,
             template_only=template_only,
             builder_model=builder_model,
             tier=tier,
@@ -307,18 +338,38 @@ def synthetic_data(
         + (", ".join(teacher_roles) if teacher_roles else "none (scripted fallbacks)")
     )
     try:
-        logger, count = generator.export_jsonl(
-            output,
             num_traces=num_traces,
             seed=seed,
             reward_threshold=reward_threshold,
             roles=selected_roles,
         )
     except Exception as exc:
         click.echo(f"Error: synthetic data generation failed: {exc}", err=True)
         sys.exit(1)
     click.echo(f"Wrote {count} JSONL records to {output}")
     click.echo(f"  Episodes: {len(logger.episodes)}")
     click.echo(f"  Randomized flags: {'yes' if randomize_flags else 'no'}")
@@ -533,6 +584,123 @@ def deploy(snapshot: str, compose_dir: str | None) -> None:
         pass  # Non-critical
 # ---------------------------------------------------------------------------
 # server
 # ---------------------------------------------------------------------------

 @click.option("--teacher-model", default=None, help="LiteLLM teacher model. If omitted, selected roles use scripted agents.")
 @click.option("--red-model", default=None, help="Override model for Red teacher.")
 @click.option("--blue-model", default=None, help="Override model for Blue teacher.")
+@click.option("--bootstrap-traces", multiple=True, type=click.Path(exists=True), help="Existing SFT JSONL files to merge into the output.")
+@click.option("--bootstrap-examples", default=0, type=click.IntRange(0), help="How many bootstrap traces to inject as few-shot examples per generated role.")
+@click.option("--merge-bootstrap/--generated-only", default=True, help="Merge bootstrap traces into the output file, or emit only newly generated records.")
+@click.option("--tool-info", multiple=True, type=click.Path(exists=True), help="Text, JSON, or YAML tool catalog file to append to generated system prompts.")
 @click.option("--temperature", default=0.2, type=float, help="Teacher sampling temperature.")
 @click.option("--max-tokens", default=512, type=int, help="Maximum completion tokens per teacher action.")
 @click.option("--template-only/--llm-builder", default=True, help="When using --manifest, build snapshots deterministically instead of via LLM.")
     teacher_model: str | None,
     red_model: str | None,
     blue_model: str | None,
+    bootstrap_traces: tuple[str, ...],
+    bootstrap_examples: int,
+    merge_bootstrap: bool,
+    tool_info: tuple[str, ...],
     temperature: float,
     max_tokens: int,
     template_only: bool,
         SyntheticTraceGenerator,
         build_teacher_agents,
     )
+    from open_range.training.dataset import (
+        append_tool_context,
+        extract_bootstrap_messages,
+        load_jsonl_records,
+        load_tool_context,
+        write_jsonl_records,
+    )
     if bool(manifest) == bool(snapshot):
         click.echo("Error: provide exactly one of --manifest or --snapshot.", err=True)
         teacher_model
         or os.environ.get("OPENRANGE_SYNTH_MODEL")
     )
+    bootstrap_records = load_jsonl_records(bootstrap_traces) if bootstrap_traces else []
+    tool_context = load_tool_context(tool_info) if tool_info else ""
     red_agent, blue_agent = build_teacher_agents(
         teacher_model=resolved_teacher_model,
         roles=selected_roles,
         red_model=red_model,
         blue_model=blue_model,
+        red_bootstrap_messages=extract_bootstrap_messages(
+            bootstrap_records,
+            role="red",
+            limit=bootstrap_examples,
+        ),
+        blue_bootstrap_messages=extract_bootstrap_messages(
+            bootstrap_records,
+            role="blue",
+            limit=bootstrap_examples,
+        ),
+        red_system_suffix=tool_context,
+        blue_system_suffix=tool_context,
         temperature=temperature,
         max_tokens=max_tokens,
     )
             snapshot=_load_snapshot(snapshot),
             red_agent=red_agent,
             blue_agent=blue_agent,
+            active_roles=selected_roles,
             tier=tier,
             max_steps=max_steps,
             randomize_flags=randomize_flags,
             _load_manifest(str(manifest)),
             red_agent=red_agent,
             blue_agent=blue_agent,
+            active_roles=selected_roles,
             template_only=template_only,
             builder_model=builder_model,
             tier=tier,
         + (", ".join(teacher_roles) if teacher_roles else "none (scripted fallbacks)")
     )
     try:
+        logger = generator.generate(
             num_traces=num_traces,
             seed=seed,
+        )
+        generated_records = logger.to_records(
             reward_threshold=reward_threshold,
             roles=selected_roles,
         )
+        if tool_context:
+            generated_records = append_tool_context(
+                generated_records,
+                tool_context,
+            )
+        records_to_write = [*bootstrap_records, *generated_records] if merge_bootstrap else generated_records
+        count = write_jsonl_records(output, records_to_write)
+        generated_count = len(generated_records)
+        bootstrap_count = len(bootstrap_records)
     except Exception as exc:
         click.echo(f"Error: synthetic data generation failed: {exc}", err=True)
         sys.exit(1)
     click.echo(f"Wrote {count} JSONL records to {output}")
+    click.echo(f"  Generated records: {generated_count}")
+    if bootstrap_traces and merge_bootstrap:
+        click.echo(f"  Bootstrap records: {bootstrap_count}")
+    elif bootstrap_traces:
+        click.echo(f"  Bootstrap records loaded for prompting only: {bootstrap_count}")
+    if bootstrap_examples:
+        click.echo(f"  Few-shot bootstrap examples per role: {bootstrap_examples}")
+    if tool_info:
+        click.echo(f"  Tool catalogs applied: {len(tool_info)}")
     click.echo(f"  Episodes: {len(logger.episodes)}")
     click.echo(f"  Randomized flags: {'yes' if randomize_flags else 'no'}")
         pass  # Non-critical
+# ---------------------------------------------------------------------------
+# episode
+# ---------------------------------------------------------------------------
+@cli.command()
+@click.option("-s", "--snapshot", required=True, type=click.Path(exists=True), help="Path to snapshot JSON.")
+@click.option("--mode", default="red", type=click.Choice(["red", "blue", "both"]), help="Agent role(s) to play.")
+@click.option("--golden-path", "golden", is_flag=True, default=False, help="Replay golden path steps (Red only).")
+@click.option("--interactive", is_flag=True, default=False, help="Interactive mode (read commands from stdin).")
+@click.option("--docker/--no-docker", default=False, help="Use Docker containers (default: mock mode).")
+@click.option("--max-steps", default=50, type=click.IntRange(1), help="Maximum steps per episode.")
+def episode(
+    snapshot: str,
+    mode: str,
+    golden: bool,
+    interactive: bool,
+    docker: bool,
+    max_steps: int,
+) -> None:
+    """Run an episode against a snapshot.
+    Golden-path mode replays the snapshot's golden path commands as Red.
+    Interactive mode reads commands from stdin. Default runs golden path
+    if available, otherwise enters interactive mode.
+    \b
+    Examples:
+        openrange episode -s snapshots/spec.json --golden-path
+        openrange episode -s snapshots/spec.json --interactive --mode both
+    """
+    from open_range.server.environment import RangeEnvironment
+    from open_range.server.models import RangeAction
+    spec = _load_snapshot(snapshot)
+    env = RangeEnvironment(docker_available=docker, max_steps=max_steps)
+    obs = env.reset(snapshot=spec, episode_id="cli-episode")
+    click.echo(f"[RESET] {obs.stdout[:200]}")
+    click.echo()
+    if golden or (not interactive and spec.golden_path):
+        # Golden path replay
+        if not spec.golden_path:
+            click.echo("Error: snapshot has no golden path steps.", err=True)
+            sys.exit(1)
+        click.echo(f"Replaying {len(spec.golden_path)} golden path steps ...\n")
+        for gp in spec.golden_path:
+            action = RangeAction(command=gp.command, mode="red")
+            result = env.step(action)
+            reward = result.reward if result.reward is not None else 0.0
+            status = ""
+            if result.flags_captured:
+                status = f" FLAGS={result.flags_captured}"
+            if result.done:
+                status += " [DONE]"
+            click.echo(f"  [{gp.step:2d}] RED >> {gp.command[:80]}")
+            if docker:
+                stdout_preview = result.stdout[:120].replace("\n", " ")
+                click.echo(f"       stdout: {stdout_preview}")
+            else:
+                click.echo(f"       expect: {gp.expect_in_stdout[:60]}")
+            click.echo(f"       reward={reward:.4f}{status}")
+            if result.done:
+                break
+    elif interactive:
+        # Interactive REPL
+        click.echo("Interactive mode. Type commands, Ctrl-D to exit.\n")
+        current_mode = mode if mode != "both" else "red"
+        try:
+            while True:
+                prompt = f"[{current_mode.upper()}] >> "
+                try:
+                    cmd = input(prompt)
+                except EOFError:
+                    break
+                if not cmd.strip():
+                    continue
+                if cmd.strip() == "/switch" and mode == "both":
+                    current_mode = "blue" if current_mode == "red" else "red"
+                    click.echo(f"Switched to {current_mode.upper()}")
+                    continue
+                action = RangeAction(command=cmd, mode=current_mode)
+                result = env.step(action)
+                if result.stdout:
+                    click.echo(result.stdout)
+                if result.stderr:
+                    click.echo(result.stderr, err=True)
+                reward = result.reward if result.reward is not None else 0.0
+                click.echo(f"[reward={reward:.4f}]")
+                if result.done:
+                    click.echo("[EPISODE DONE]")
+                    break
+        except KeyboardInterrupt:
+            click.echo("\nInterrupted.")
+    else:
+        click.echo("No golden path and --interactive not set. Use --interactive for manual play.", err=True)
+        sys.exit(1)
+    # Print final state
+    state = env.state
+    click.echo(f"\n{'='*60}")
+    click.echo(f"  RESULT")
+    click.echo(f"{'='*60}")
+    click.echo(f"  Steps:       {state.step_count}")
+    click.echo(f"  Flags found: {state.flags_found}")
+    click.echo(f"  Tier:        {state.tier}")
+    click.echo(f"  Episode:     {state.episode_id}")
+    click.echo(f"{'='*60}")
 # ---------------------------------------------------------------------------
 # server
 # ---------------------------------------------------------------------------

src/open_range/protocols.py CHANGED Viewed

@@ -15,7 +15,50 @@ from pydantic import AliasChoices, BaseModel, ConfigDict, Field
 # ---------------------------------------------------------------------------
-# Pydantic models
 # ---------------------------------------------------------------------------
@@ -204,6 +247,7 @@ class SnapshotSpec(BaseModel):
     task: TaskSpec = Field(default_factory=TaskSpec)
     compose: dict[str, Any] = Field(default_factory=dict)  # rendered docker-compose
     files: dict[str, str] = Field(default_factory=dict)  # path -> content
     lineage: LineageMetadata = Field(default_factory=LineageMetadata)
     mutation_plan: MutationPlan | None = None

 # ---------------------------------------------------------------------------
+# Pydantic models — service lifecycle
+# ---------------------------------------------------------------------------
+class ReadinessCheck(BaseModel):
+    """How to verify a service is ready after starting.
+    Supports three probe types:
+    - ``tcp``: connect to *port* on localhost.
+    - ``http``: GET *url* and expect a 2xx response.
+    - ``command``: run *command* and expect exit code 0.
+    """
+    type: Literal["tcp", "http", "command"] = "tcp"
+    port: int = 0
+    url: str = ""
+    command: str = ""
+    timeout_s: int = 30
+    interval_s: float = 1.0
+class ServiceSpec(BaseModel):
+    """Declarative service lifecycle for subprocess mode.
+    Generated by the Renderer alongside docker-compose.yml.
+    Consumed by ``RangeEnvironment._start_snapshot_services()``.
+    Each entry describes one daemon that must be running for the snapshot
+    to function.  The *host* field links back to the topology host name
+    so that stop/restart logic can correlate services to logical hosts.
+    """
+    host: str
+    daemon: str
+    packages: list[str] = Field(default_factory=list)
+    init_commands: list[str] = Field(default_factory=list)
+    start_command: str
+    readiness: ReadinessCheck = Field(default_factory=ReadinessCheck)
+    log_dir: str = ""
+    env_vars: dict[str, str] = Field(default_factory=dict)
+# ---------------------------------------------------------------------------
+# Pydantic models — build context & topology
 # ---------------------------------------------------------------------------
     task: TaskSpec = Field(default_factory=TaskSpec)
     compose: dict[str, Any] = Field(default_factory=dict)  # rendered docker-compose
     files: dict[str, str] = Field(default_factory=dict)  # path -> content
+    services: list[ServiceSpec] = Field(default_factory=list)  # subprocess-mode daemons
     lineage: LineageMetadata = Field(default_factory=LineageMetadata)
     mutation_plan: MutationPlan | None = None

src/open_range/server/environment.py CHANGED Viewed

@@ -23,7 +23,7 @@ import time
 from typing import TYPE_CHECKING, Any
 from uuid import uuid4
-from open_range.protocols import SnapshotSpec, TaskSpec
 from open_range.server.models import RangeAction, RangeObservation, RangeState
@@ -209,15 +209,19 @@ class RangeEnvironment(_BASE):  # type: ignore[misc]
         if self._execution_mode == "subprocess":
             return host
-        # In unit-test mock mode, return the bare hostname for compatibility
         if self._docker_available is False and self._execution_mode == "docker":
             return host
-        raise RuntimeError(
-            f"Cannot resolve container for host '{host}'. "
-            f"No compose config, no running container found, and no mock mode active. "
-            f"Ensure Docker is running or provide a snapshot with compose configuration."
         )
     def _exec_via_subprocess(self, host: str, command: str, timeout: float = 30.0) -> tuple[str, str]:
         """Execute a command via local subprocess (all-in-one container mode).
@@ -636,6 +640,43 @@ class RangeEnvironment(_BASE):  # type: ignore[misc]
     # NPC lifecycle
     # -----------------------------------------------------------------
     def _start_npcs(self, snapshot: SnapshotSpec) -> None:
         """Start NPC traffic generators for the current episode.
@@ -650,19 +691,24 @@ class RangeEnvironment(_BASE):  # type: ignore[misc]
             from open_range.builder.npc.npc_manager import NPCManager
             mock = (self._docker_available is False) or (self._execution_mode != "docker")
-            mgr = NPCManager(mock_mode=mock)
             self._npc_manager = mgr
             # Start synchronously (NPCManager.start_sync handles mock vs live)
-            mgr.start_sync(snapshot)
             # Seed the traffic log immediately from chat traffic generated at
             # start time so that Blue has NPC noise from step 1.
             self._refresh_npc_traffic_log()
             logger.info(
-                "NPC manager started (mock=%s, personas=%d)",
                 mock,
                 len(snapshot.npc_personas or []),
             )
         except Exception as exc:

 from typing import TYPE_CHECKING, Any
 from uuid import uuid4
+from open_range.protocols import ServiceSpec, SnapshotSpec, TaskSpec
 from open_range.server.models import RangeAction, RangeObservation, RangeState
         if self._execution_mode == "subprocess":
             return host
+        # In unit-test mock mode or when no containers are running,
+        # return the bare hostname.  Execution will fail gracefully
+        # (docker exec won't find the container → stderr returned).
         if self._docker_available is False and self._execution_mode == "docker":
             return host
+        # Docker is reachable but no matching container exists — return bare
+        # hostname so the exec layer can report the error in the observation
+        # instead of crashing the API.
+        logger.debug(
+            "No running container found for host '%s'; returning bare name", host
         )
+        return host
     def _exec_via_subprocess(self, host: str, command: str, timeout: float = 30.0) -> tuple[str, str]:
         """Execute a command via local subprocess (all-in-one container mode).
     # NPC lifecycle
     # -----------------------------------------------------------------
+    def _build_container_set(self) -> "ContainerSet | None":
+        """Build a ContainerSet from running Docker containers.
+        Returns None when Docker is unavailable or no containers are found.
+        """
+        from open_range.protocols import ContainerSet
+        client = self._get_docker()
+        if client is None:
+            return None
+        container_ids: dict[str, str] = {}
+        try:
+            for container in client.containers.list():
+                name = container.name
+                # Map service name to container id (open-range-web-1 → web)
+                for suffix in ("-1",):
+                    if name.endswith(suffix):
+                        svc = name.rsplit("-", 1)[0]  # open-range-web
+                        svc = svc.rsplit("-", 1)[-1]   # web
+                        container_ids[svc] = name
+                        break
+                else:
+                    container_ids[name] = name
+        except Exception as exc:
+            logger.debug("Container discovery failed: %s", exc)
+            return None
+        if not container_ids:
+            return None
+        project = "open-range"
+        if self._snapshot and self._snapshot.compose:
+            project = self._snapshot.compose.get("x-project-name", project)
+        return ContainerSet(project_name=project, container_ids=container_ids)
     def _start_npcs(self, snapshot: SnapshotSpec) -> None:
         """Start NPC traffic generators for the current episode.
             from open_range.builder.npc.npc_manager import NPCManager
             mock = (self._docker_available is False) or (self._execution_mode != "docker")
+            npc_model = os.environ.get("OPENRANGE_NPC_MODEL")
+            mgr = NPCManager(mock_mode=mock, model=npc_model)
             self._npc_manager = mgr
+            # Build ContainerSet for live Docker mode
+            containers = None if mock else self._build_container_set()
             # Start synchronously (NPCManager.start_sync handles mock vs live)
+            mgr.start_sync(snapshot, containers)
             # Seed the traffic log immediately from chat traffic generated at
             # start time so that Blue has NPC noise from step 1.
             self._refresh_npc_traffic_log()
             logger.info(
+                "NPC manager started (mock=%s, containers=%s, personas=%d)",
                 mock,
+                bool(containers),
                 len(snapshot.npc_personas or []),
             )
         except Exception as exc:

src/open_range/server/rewards.py CHANGED Viewed

@@ -243,7 +243,8 @@ class CompositeRedReward:
         }
         evidence_score = self.evidence.score(evidence_content, topo_hosts)
-        # Social engineering
         social_successes = [
             e for e in npc_log
             if e.get("type") == "social_engineering" and e.get("result") == "success"

         }
         evidence_score = self.evidence.score(evidence_content, topo_hosts)
+        # Social engineering -- reactive NPC actions from send_phish or
+        # agent loop stimulus reactions (type=social_engineering, result=success/blocked)
         social_successes = [
             e for e in npc_log
             if e.get("type") == "social_engineering" and e.get("result") == "success"

src/open_range/training/__init__.py CHANGED Viewed

@@ -1,5 +1,12 @@
 """Training utilities for OpenRange."""
 from open_range.training.synthetic import (
     SyntheticRangeEnvironment,
     SyntheticTraceGenerator,
@@ -8,8 +15,13 @@ from open_range.training.synthetic import (
 )
 __all__ = [
     "SyntheticRangeEnvironment",
     "SyntheticTraceGenerator",
     "build_teacher_agents",
     "randomize_snapshot_flags",
 ]

 """Training utilities for OpenRange."""
+from open_range.training.dataset import (
+    append_tool_context,
+    extract_bootstrap_messages,
+    load_jsonl_records,
+    load_tool_context,
+    write_jsonl_records,
+)
 from open_range.training.synthetic import (
     SyntheticRangeEnvironment,
     SyntheticTraceGenerator,
 )
 __all__ = [
+    "append_tool_context",
+    "extract_bootstrap_messages",
+    "load_jsonl_records",
+    "load_tool_context",
     "SyntheticRangeEnvironment",
     "SyntheticTraceGenerator",
     "build_teacher_agents",
     "randomize_snapshot_flags",
+    "write_jsonl_records",
 ]

src/open_range/training/dataset.py ADDED Viewed

	@@ -0,0 +1,170 @@

+"""Dataset helpers for synthetic and bootstrap SFT records."""
+from __future__ import annotations
+import copy
+import json
+from pathlib import Path
+from typing import Any, Iterable
+import yaml
+def load_jsonl_records(paths: Iterable[str | Path]) -> list[dict[str, Any]]:
+    """Load newline-delimited JSON records from one or more files."""
+    records: list[dict[str, Any]] = []
+    for raw_path in paths:
+        path = Path(raw_path)
+        with path.open("r", encoding="utf-8") as handle:
+            for lineno, line in enumerate(handle, start=1):
+                text = line.strip()
+                if not text:
+                    continue
+                payload = json.loads(text)
+                if not isinstance(payload, dict):
+                    raise TypeError(f"{path}:{lineno} is not a JSON object")
+                records.append(payload)
+    return records
+def load_tool_context(paths: Iterable[str | Path]) -> str:
+    """Load and normalize a tool-context file or files."""
+    blocks: list[str] = []
+    for raw_path in paths:
+        path = Path(raw_path)
+        suffix = path.suffix.lower()
+        text = path.read_text(encoding="utf-8").strip()
+        if not text:
+            continue
+        if suffix in {".json", ".yaml", ".yml"}:
+            payload = json.loads(text) if suffix == ".json" else yaml.safe_load(text)
+            blocks.append(_render_tool_payload(payload))
+        else:
+            blocks.append(text)
+    return "\n\n".join(block for block in blocks if block.strip())
+def append_tool_context(
+    records: list[dict[str, Any]],
+    tool_context: str,
+) -> list[dict[str, Any]]:
+    """Append tool descriptions to the first system prompt in each record."""
+    if not tool_context.strip():
+        return [copy.deepcopy(record) for record in records]
+    block = tool_context.strip()
+    if not block.lower().startswith("available tools"):
+        block = "Available tools:\n" + block
+    enriched: list[dict[str, Any]] = []
+    for record in records:
+        clone = copy.deepcopy(record)
+        messages = clone.get("messages", [])
+        if isinstance(messages, list):
+            for message in messages:
+                if not isinstance(message, dict):
+                    continue
+                if message.get("role") != "system":
+                    continue
+                content = str(message.get("content", "")).rstrip()
+                if block not in content:
+                    message["content"] = f"{content}\n\n{block}".strip()
+                break
+        enriched.append(clone)
+    return enriched
+def extract_bootstrap_messages(
+    records: list[dict[str, Any]],
+    *,
+    role: str = "red",
+    limit: int = 0,
+) -> list[dict[str, Any]]:
+    """Extract few-shot chat messages from prior SFT records."""
+    if limit <= 0:
+        return []
+    examples: list[dict[str, Any]] = []
+    ranked_records = sorted(records, key=_bootstrap_record_rank, reverse=True)
+    used = 0
+    for record in ranked_records:
+        record_role = (
+            str(record.get("role", "")).strip().lower()
+            or str(record.get("metadata", {}).get("role", "")).strip().lower()
+        )
+        if record_role and record_role != role:
+            continue
+        messages = record.get("messages", [])
+        if not isinstance(messages, list):
+            continue
+        example = [
+            copy.deepcopy(message)
+            for message in messages
+            if isinstance(message, dict)
+        ]
+        if example and example[0].get("role") == "system":
+            example = example[1:]
+        if not example:
+            continue
+        examples.extend(example)
+        used += 1
+        if used >= limit:
+            break
+    return examples
+def write_jsonl_records(path: str | Path, records: list[dict[str, Any]]) -> int:
+    """Write JSONL records to *path*."""
+    output = Path(path)
+    output.parent.mkdir(parents=True, exist_ok=True)
+    with output.open("w", encoding="utf-8") as handle:
+        for record in records:
+            handle.write(json.dumps(record) + "\n")
+    return len(records)
+def _render_tool_payload(payload: Any) -> str:
+    if isinstance(payload, str):
+        return payload.strip()
+    if isinstance(payload, dict):
+        lines = []
+        for key, value in payload.items():
+            if isinstance(value, str):
+                lines.append(f"- {key}: {value}")
+            else:
+                rendered = json.dumps(value, sort_keys=True)
+                lines.append(f"- {key}: {rendered}")
+        return "\n".join(lines)
+    if isinstance(payload, list):
+        lines = []
+        for item in payload:
+            if isinstance(item, dict):
+                name = str(item.get("name", "")).strip()
+                description = str(item.get("description", "")).strip()
+                if name and description:
+                    lines.append(f"- {name}: {description}")
+                elif name:
+                    lines.append(f"- {name}")
+                else:
+                    lines.append(f"- {json.dumps(item, sort_keys=True)}")
+            else:
+                lines.append(f"- {item}")
+        return "\n".join(lines)
+    return str(payload).strip()
+def _bootstrap_record_rank(record: dict[str, Any]) -> tuple[int, int, int]:
+    metadata = record.get("metadata", {})
+    success = 1 if metadata.get("success") else 0
+    total_turns = int(metadata.get("total_turns") or 0)
+    tool_turns = sum(
+        1
+        for message in record.get("messages", [])
+        if isinstance(message, dict)
+        and message.get("role") == "assistant"
+        and message.get("tool_calls")
+    )
+    return success, tool_turns, total_turns

src/open_range/training/synthetic.py CHANGED Viewed

@@ -16,8 +16,9 @@ from pathlib import Path
 from typing import Any
 from open_range.agents.llm_agent import LLMRangeAgent
 from open_range.agents.protocol import RangeAgent
-from open_range.agents.scripted_agent import ScriptedBlueAgent, ScriptedRedAgent
 from open_range.builder.builder import LLMSnapshotBuilder, TemplateOnlyBuilder
 from open_range.protocols import BuildContext, SnapshotBuilder, SnapshotSpec, Vulnerability
 from open_range.server.environment import RangeEnvironment
@@ -27,6 +28,14 @@ from open_range.training.trajectory import TrajectoryLogger
 logger = logging.getLogger(__name__)
 _TOKEN_RE = re.compile(r"[a-z0-9_./:-]+")
 def _run_async(coro: Any) -> Any:
@@ -106,6 +115,207 @@ def _observation_text(observation: str | RangeObservation) -> str:
     return "\n\n".join(parts)
 class SyntheticRangeEnvironment(RangeEnvironment):
     """Fast, deterministic simulator built from a ``SnapshotSpec``."""
@@ -162,6 +372,12 @@ class SyntheticRangeEnvironment(RangeEnvironment):
             return "kali\n", ""
         if normalized == "pwd":
             return "/root\n", ""
         if normalized.startswith("ls"):
             return self._render_ls(command), ""
         if normalized.startswith("cat "):
@@ -320,6 +536,37 @@ class SyntheticRangeEnvironment(RangeEnvironment):
             return "220 mail ESMTP Postfix\n"
         return "HTTP/1.1 200 OK\n"
     def _render_mysql(self, command: str) -> str:
         lowered = command.lower()
         flag = self._flag_value()
@@ -365,6 +612,8 @@ class SyntheticRangeEnvironment(RangeEnvironment):
             return "", "cat: missing operand"
         if path in self._ephemeral_files:
             return self._ephemeral_files[path], ""
         for flag in self._snapshot.flags if self._snapshot else []:
             if path == flag.path or path.endswith(Path(flag.path).name):
                 return f"{flag.value}\n", ""
@@ -381,12 +630,30 @@ class SyntheticRangeEnvironment(RangeEnvironment):
             return "root:x:0:0:root:/root:/bin/bash\nwww-data:x:33:33:www-data:/var/www:/usr/sbin/nologin\n", ""
         return "", f"cat: {path}: No such file or directory"
     def _render_ls(self, command: str) -> str:
         path = self._extract_first_path(command) or "."
         if path in (".", "/root"):
             entries = ["notes.txt"]
             entries.extend(sorted(Path(p).name for p in self._ephemeral_files))
             return "\n".join(sorted(set(entries))) + "\n"
         if path == "/var/log/siem":
             return "consolidated\nalerts.log\nweb_access.log\n"
         if self._snapshot and self._snapshot.files:
@@ -495,6 +762,7 @@ class SyntheticTraceGenerator:
         builder: SnapshotBuilder | None = None,
         red_agent: RangeAgent | None = None,
         blue_agent: RangeAgent | None = None,
         tier: int = 1,
         max_steps: int = 30,
         randomize_flags: bool = True,
@@ -507,6 +775,7 @@ class SyntheticTraceGenerator:
         self._tier = tier
         self._max_steps = max_steps
         self._randomize_flags = randomize_flags
         self.red_agent = red_agent or ScriptedRedAgent()
         self.blue_agent = blue_agent or ScriptedBlueAgent()
@@ -517,6 +786,7 @@ class SyntheticTraceGenerator:
         *,
         red_agent: RangeAgent | None = None,
         blue_agent: RangeAgent | None = None,
         builder: SnapshotBuilder | None = None,
         template_only: bool = True,
         builder_model: str | None = None,
@@ -537,6 +807,7 @@ class SyntheticTraceGenerator:
             builder=resolved_builder,
             red_agent=red_agent,
             blue_agent=blue_agent,
             tier=tier,
             max_steps=max_steps,
             randomize_flags=randomize_flags,
@@ -605,18 +876,29 @@ class SyntheticTraceGenerator:
             if active_snapshot is None:
                 raise RuntimeError("Synthetic environment failed to load a snapshot")
-            task = active_snapshot.task
-            red_briefing = getattr(task, "red_briefing", "") or "Begin the assessment."
-            blue_briefing = getattr(task, "blue_briefing", "") or "Monitor the range."
-            self.red_agent.reset(briefing=red_briefing, role="red")
-            self.blue_agent.reset(briefing=blue_briefing, role="blue")
             snapshot_id = active_snapshot.topology.get("snapshot_id", f"synth-{episode_index:04d}")
             logger.start_episode(
                 episode_id=f"synth-{episode_index:04d}",
                 snapshot_id=snapshot_id,
                 tier=env.state.tier,
             )
             current_red_observation: str | RangeObservation = red_briefing
@@ -626,35 +908,64 @@ class SyntheticTraceGenerator:
             last_obs: RangeObservation = RangeObservation(stdout=red_briefing)
             while step < self._max_steps and not done:
-                red_cmd = self.red_agent.act(current_red_observation)
-                red_view = _observation_text(current_red_observation)
-                red_obs = env.step(RangeAction(command=red_cmd, mode="red"))
-                logger.log_turn(
-                    role="red",
-                    observation=red_view,
-                    action=red_cmd,
-                    reward=float(red_obs.reward or 0.0),
-                )
-                step += 1
-                last_obs = red_obs
-                done = bool(red_obs.done)
-                current_blue_observation = red_obs
-                if done or step >= self._max_steps:
-                    break
                 blue_cmd = self.blue_agent.act(current_blue_observation)
-                blue_view = _observation_text(current_blue_observation)
                 blue_obs = env.step(RangeAction(command=blue_cmd, mode="blue"))
                 logger.log_turn(
                     role="blue",
-                    observation=blue_view,
                     action=blue_cmd,
                     reward=float(blue_obs.reward or 0.0),
                 )
                 step += 1
                 last_obs = blue_obs
                 done = bool(blue_obs.done)
-                current_red_observation = blue_obs
             state = env.state
             outcome = self._episode_outcome(env)
@@ -666,6 +977,13 @@ class SyntheticTraceGenerator:
                     "red_actions": len(env.red_history),
                     "blue_actions": len(env.blue_history),
                     "done": bool(last_obs.done),
                 },
             )
         finally:
@@ -689,14 +1007,27 @@ def build_teacher_agents(
     roles: tuple[str, ...] = ("red",),
     red_model: str | None = None,
     blue_model: str | None = None,
     temperature: float | None = 0.2,
     max_tokens: int = 512,
     **litellm_kwargs: Any,
 ) -> tuple[RangeAgent, RangeAgent]:
     """Construct teacher agents for the selected roles, scripted fallbacks otherwise."""
     if "red" in roles and (red_model or teacher_model):
         red_agent: RangeAgent = LLMRangeAgent(
             model=red_model or str(teacher_model),
             temperature=temperature,
             max_tokens=max_tokens,
             **litellm_kwargs,
@@ -707,6 +1038,8 @@ def build_teacher_agents(
     if "blue" in roles and (blue_model or teacher_model):
         blue_agent: RangeAgent = LLMRangeAgent(
             model=blue_model or str(teacher_model),
             temperature=temperature,
             max_tokens=max_tokens,
             **litellm_kwargs,

 from typing import Any
 from open_range.agents.llm_agent import LLMRangeAgent
+from open_range.agents.parsing import strip_command_from_response
 from open_range.agents.protocol import RangeAgent
+from open_range.agents.replay_agent import ScriptedBlueAgent, ScriptedRedAgent
 from open_range.builder.builder import LLMSnapshotBuilder, TemplateOnlyBuilder
 from open_range.protocols import BuildContext, SnapshotBuilder, SnapshotSpec, Vulnerability
 from open_range.server.environment import RangeEnvironment
 logger = logging.getLogger(__name__)
 _TOKEN_RE = re.compile(r"[a-z0-9_./:-]+")
+_SYNTHETIC_REASONING_GUIDE = (
+    "When you act, think briefly inside <think>...</think> about what you learned, "
+    "what hypothesis you are testing, and why the next step is justified. "
+    "After the reasoning, output exactly one command prefixed with 'Command:'. "
+    "Prefer high-signal interaction with the listed services and artifacts over "
+    "repeating local workstation discovery commands. Do not claim success until "
+    "the tool output confirms it."
+)
 def _run_async(coro: Any) -> Any:
     return "\n\n".join(parts)
+def _prefixed_output(text: str, *, step: int) -> str:
+    """Add deterministic pseudo-timing prefixes to tool output lines."""
+    cleaned = text.strip("\n")
+    if not cleaned:
+        return f"[{0.2 + (step % 5) * 0.1:.1f}s]"
+    prefix = f"[{0.2 + (step % 5) * 0.1:.1f}s] "
+    return "\n".join(f"{prefix}{line}" for line in cleaned.splitlines())
+def _host_inventory(snapshot: SnapshotSpec) -> str:
+    host_lines: list[str] = []
+    zones = snapshot.topology.get("zones", {}) if isinstance(snapshot.topology, dict) else {}
+    zone_map: dict[str, str] = {}
+    if isinstance(zones, dict):
+        for zone, hosts in zones.items():
+            if isinstance(hosts, list):
+                for host in hosts:
+                    zone_map[str(host)] = str(zone)
+    for host in _iter_hosts(snapshot):
+        details = [zone_map.get(host, "").strip()]
+        if host == "web":
+            details.append("primary web application")
+        elif host == "mail":
+            details.append("mail gateway")
+        elif host == "db":
+            details.append("database service")
+        elif host == "files":
+            details.append("file share service")
+        elif host == "ldap":
+            details.append("directory and identity service")
+        elif host == "siem":
+            details.append("log aggregation and monitoring")
+        elif host == "firewall":
+            details.append("perimeter routing and filtering")
+        elif host == "attacker":
+            details.append("your operator workstation")
+        rendered = ", ".join(part for part in details if part)
+        if rendered:
+            host_lines.append(f"- {host}: {rendered}")
+        else:
+            host_lines.append(f"- {host}")
+    return "\n".join(host_lines)
+def _entry_points(snapshot: SnapshotSpec) -> str:
+    entries: list[str] = []
+    seen: set[str] = set()
+    for step in snapshot.golden_path:
+        for match in re.findall(r"https?://[^\s'\"`]+", step.command):
+            if match not in seen:
+                entries.append(f"- {match}")
+                seen.add(match)
+    for host in _iter_hosts(snapshot):
+        if host == "web":
+            for candidate in ("http://web/", "http://web/login", "http://web/search?q=test"):
+                if candidate not in seen:
+                    entries.append(f"- {candidate}")
+                    seen.add(candidate)
+        elif host == "mail":
+            target = "mail:25 (SMTP)"
+            if target not in seen:
+                entries.append(f"- {target}")
+                seen.add(target)
+        elif host == "db":
+            target = "db:3306 (MySQL)"
+            if target not in seen:
+                entries.append(f"- {target}")
+                seen.add(target)
+        elif host == "files":
+            target = "files:445 (SMB)"
+            if target not in seen:
+                entries.append(f"- {target}")
+                seen.add(target)
+        elif host == "ldap":
+            target = "ldap:389 (LDAP)"
+            if target not in seen:
+                entries.append(f"- {target}")
+                seen.add(target)
+    return "\n".join(entries)
+def _build_training_briefing(snapshot: SnapshotSpec, *, role: str) -> str:
+    task = snapshot.task
+    org_name = str(snapshot.topology.get("org_name", "OpenRange target")).strip() or "OpenRange target"
+    tier = int(snapshot.topology.get("tier", 1) or 1)
+    host_summary = _host_inventory(snapshot)
+    success_conditions = task.success_conditions if hasattr(task, "success_conditions") else []
+    objective_lines = []
+    for item in success_conditions:
+        if isinstance(item, dict):
+            description = str(item.get("description") or item.get("type") or "").strip()
+            if description:
+                objective_lines.append(f"- {description}")
+    if role == "red":
+        scenario = task.red_briefing or "Investigate the environment and recover the flag."
+        mission = [
+            "# ROLE",
+            "You are an autonomous red-team operator authorized to assess this environment.",
+            "",
+            "# TARGET INFORMATION",
+            f"**Organization**: {org_name}",
+            f"**Tier**: {tier}",
+            "**Goal**: Identify a viable attack path, validate it with evidence, and recover the exercise flag.",
+            "",
+            "## Scenario Brief",
+            scenario,
+            "",
+            "## Reachable Hosts",
+            host_summary or "- attacker",
+            "",
+            "## Known Access Points",
+            _entry_points(snapshot),
+            "",
+            "## Operating Notes",
+            "- Service hostnames are already known from the range context.",
+            "- Favor concrete interaction with listed services over repeated local interface discovery.",
+        ]
+        if objective_lines:
+            mission.extend(["", "## Success Conditions", *objective_lines])
+        return "\n".join(mission).strip()
+    scenario = task.blue_briefing or "Monitor the SIEM, investigate suspicious activity, and respond."
+    mission = [
+        "# ROLE",
+        "You are an autonomous SOC analyst responsible for this environment.",
+        "",
+        "# DEFENSIVE CONTEXT",
+        f"**Organization**: {org_name}",
+        f"**Tier**: {tier}",
+        "**Goal**: Detect attacker activity, confirm it with log evidence, and respond without breaking healthy services.",
+        "",
+        "## Scenario Brief",
+        scenario,
+        "",
+        "## Monitored Hosts",
+        host_summary or "- siem",
+    ]
+    return "\n".join(mission).strip()
+def _infer_tool_call(command: str) -> tuple[str, dict[str, Any]]:
+    stripped = command.strip()
+    lowered = stripped.lower()
+    try:
+        parts = shlex.split(stripped)
+    except ValueError:
+        parts = stripped.split()
+    if lowered.startswith("submit_flag "):
+        flag = stripped.split(maxsplit=1)[1] if len(stripped.split(maxsplit=1)) == 2 else ""
+        return "flag_found", {"flag": flag}
+    if lowered.startswith("cat ") and len(parts) >= 2:
+        return "read_file", {"file_path": parts[1], "line_numbers": True}
+    if lowered.startswith("grep "):
+        return "grep", {"command": stripped}
+    if lowered.startswith("find "):
+        return "file_search", {"command": stripped}
+    return "shell_command", {"command": stripped, "timeout": 120}
+def _render_trace_reasoning(agent: RangeAgent, *, role: str, command: str) -> str:
+    raw_response = str(getattr(agent, "last_response_text", "") or "").strip()
+    if raw_response:
+        reasoning = strip_command_from_response(raw_response, command)
+        if reasoning:
+            return reasoning
+    lowered = command.lower()
+    if "nmap" in lowered:
+        thought = "I need a service inventory first so I can narrow the exposed attack surface."
+    elif "curl" in lowered and ("union" in lowered or "select" in lowered):
+        thought = "The web workflow looks injectable, so I will test a UNION-style payload that can expose sensitive data."
+    elif "curl" in lowered:
+        thought = "I should interrogate the web surface directly to learn the available routes and behaviors."
+    elif lowered.startswith("cat "):
+        thought = "I need to inspect the referenced file directly for source-level clues, credentials, or the flag."
+    elif lowered.startswith("grep "):
+        thought = (
+            "I need to filter the SIEM signal down to evidence that confirms the current hypothesis."
+            if role == "blue"
+            else "I should search the available artifacts for indicators that support the next exploit step."
+        )
+    elif lowered.startswith("submit_flag "):
+        thought = "The recovered token is strong enough to validate immediately."
+    elif lowered.startswith("submit_finding "):
+        thought = "The observed activity is concrete enough to escalate as a finding."
+    else:
+        thought = "I will take the next low-risk step that reduces uncertainty and advances the objective."
+    return f"<think>\n{thought}\n</think>"
+def _blue_stimulus(env: SyntheticRangeEnvironment) -> RangeObservation:
+    alerts = env._get_pending_alerts()
+    status = "Suspicious activity has been observed in the monitored environment."
+    if not alerts:
+        status = "No high-confidence alerts yet. Continue monitoring for attacker activity."
+    return RangeObservation(stdout=status, alerts=alerts)
 class SyntheticRangeEnvironment(RangeEnvironment):
     """Fast, deterministic simulator built from a ``SnapshotSpec``."""
             return "kali\n", ""
         if normalized == "pwd":
             return "/root\n", ""
+        if normalized.startswith("ip ") or normalized in {"ip", "hostname -i", "hostname -i && ip route && ip -br addr", "hostname -i && ip route"}:
+            return self._render_network_identity(command), ""
+        if normalized.startswith("arp"):
+            return self._render_arp_cache(), ""
+        if normalized.startswith("getent hosts"):
+            return self._render_hosts_lookup(command), ""
         if normalized.startswith("ls"):
             return self._render_ls(command), ""
         if normalized.startswith("cat "):
             return "220 mail ESMTP Postfix\n"
         return "HTTP/1.1 200 OK\n"
+    def _render_network_identity(self, command: str) -> str:
+        del command
+        lines = [
+            "lo               UNKNOWN        127.0.0.1/8",
+            "eth0             UP             10.0.0.2/24",
+            "default via 10.0.0.1 dev eth0",
+            "10.0.0.0/24 dev eth0 proto kernel scope link src 10.0.0.2",
+        ]
+        return "\n".join(lines) + "\n"
+    def _render_arp_cache(self) -> str:
+        return (
+            "? (10.0.0.1) at 02:42:0a:00:00:01 [ether] on eth0\n"
+            "? (10.0.1.4) at 02:42:0a:00:01:04 [ether] on eth0\n"
+        )
+    def _render_hosts_lookup(self, command: str) -> str:
+        hosts = []
+        for index, host in enumerate(_iter_hosts(self._snapshot) if self._snapshot else [], start=2):
+            if host == "attacker":
+                continue
+            hosts.append(f"10.0.{index // 4}.{index + 1} {host}")
+        if not hosts:
+            hosts = ["10.0.1.4 web", "10.0.2.10 db", "10.0.3.10 ldap"]
+        requested = command.lower().split()[2:] if len(command.split()) > 2 else []
+        if requested:
+            filtered = [line for line in hosts if any(name in line for name in requested)]
+            if filtered:
+                hosts = filtered
+        return "\n".join(hosts) + "\n"
     def _render_mysql(self, command: str) -> str:
         lowered = command.lower()
         flag = self._flag_value()
             return "", "cat: missing operand"
         if path in self._ephemeral_files:
             return self._ephemeral_files[path], ""
+        if path in ("/etc/hosts", "etc/hosts"):
+            return self._render_etc_hosts(), ""
         for flag in self._snapshot.flags if self._snapshot else []:
             if path == flag.path or path.endswith(Path(flag.path).name):
                 return f"{flag.value}\n", ""
             return "root:x:0:0:root:/root:/bin/bash\nwww-data:x:33:33:www-data:/var/www:/usr/sbin/nologin\n", ""
         return "", f"cat: {path}: No such file or directory"
+    def _render_etc_hosts(self) -> str:
+        entries = ["127.0.0.1 localhost", "10.0.0.2 attacker"]
+        host_map = {
+            "firewall": "10.0.0.3",
+            "mail": "10.0.1.3",
+            "web": "10.0.1.4",
+            "db": "10.0.2.10",
+            "files": "10.0.2.20",
+            "ldap": "10.0.3.10",
+            "siem": "10.0.3.20",
+        }
+        for host in _iter_hosts(self._snapshot) if self._snapshot else []:
+            if host in host_map:
+                entries.append(f"{host_map[host]} {host}")
+        return "\n".join(entries) + "\n"
     def _render_ls(self, command: str) -> str:
         path = self._extract_first_path(command) or "."
         if path in (".", "/root"):
             entries = ["notes.txt"]
             entries.extend(sorted(Path(p).name for p in self._ephemeral_files))
             return "\n".join(sorted(set(entries))) + "\n"
+        if path == "/":
+            return "bin\netc\nhome\nroot\ntmp\nusr\nvar\n"
         if path == "/var/log/siem":
             return "consolidated\nalerts.log\nweb_access.log\n"
         if self._snapshot and self._snapshot.files:
         builder: SnapshotBuilder | None = None,
         red_agent: RangeAgent | None = None,
         blue_agent: RangeAgent | None = None,
+        active_roles: tuple[str, ...] = ("red", "blue"),
         tier: int = 1,
         max_steps: int = 30,
         randomize_flags: bool = True,
         self._tier = tier
         self._max_steps = max_steps
         self._randomize_flags = randomize_flags
+        self._active_roles = tuple(dict.fromkeys(active_roles)) or ("red", "blue")
         self.red_agent = red_agent or ScriptedRedAgent()
         self.blue_agent = blue_agent or ScriptedBlueAgent()
         *,
         red_agent: RangeAgent | None = None,
         blue_agent: RangeAgent | None = None,
+        active_roles: tuple[str, ...] = ("red", "blue"),
         builder: SnapshotBuilder | None = None,
         template_only: bool = True,
         builder_model: str | None = None,
             builder=resolved_builder,
             red_agent=red_agent,
             blue_agent=blue_agent,
+            active_roles=active_roles,
             tier=tier,
             max_steps=max_steps,
             randomize_flags=randomize_flags,
             if active_snapshot is None:
                 raise RuntimeError("Synthetic environment failed to load a snapshot")
+            red_briefing = _build_training_briefing(
+                active_snapshot,
+                role="red",
+            )
+            blue_briefing = _build_training_briefing(
+                active_snapshot,
+                role="blue",
+            )
+            if "red" in self._active_roles:
+                self.red_agent.reset(briefing=red_briefing, role="red")
+            if "blue" in self._active_roles:
+                self.blue_agent.reset(briefing=blue_briefing, role="blue")
             snapshot_id = active_snapshot.topology.get("snapshot_id", f"synth-{episode_index:04d}")
             logger.start_episode(
                 episode_id=f"synth-{episode_index:04d}",
                 snapshot_id=snapshot_id,
                 tier=env.state.tier,
+                briefings={
+                    "red": red_briefing,
+                    "blue": blue_briefing,
+                },
             )
             current_red_observation: str | RangeObservation = red_briefing
             last_obs: RangeObservation = RangeObservation(stdout=red_briefing)
             while step < self._max_steps and not done:
+                if "red" in self._active_roles:
+                    red_cmd = self.red_agent.act(current_red_observation)
+                    red_obs = env.step(RangeAction(command=red_cmd, mode="red"))
+                    red_output = _prefixed_output(
+                        _observation_text(red_obs),
+                        step=step + 1,
+                    )
+                    tool_name, tool_arguments = _infer_tool_call(red_cmd)
+                    logger.log_turn(
+                        role="red",
+                        observation=red_output,
+                        action=red_cmd,
+                        reward=float(red_obs.reward or 0.0),
+                        assistant_content=_render_trace_reasoning(
+                            self.red_agent,
+                            role="red",
+                            command=red_cmd,
+                        ),
+                        tool_name=tool_name,
+                        tool_arguments=tool_arguments,
+                        tool_output=red_output,
+                    )
+                    step += 1
+                    last_obs = red_obs
+                    done = bool(red_obs.done)
+                    current_red_observation = red_obs
+                    current_blue_observation = _blue_stimulus(env)
+                    if done or step >= self._max_steps:
+                        break
+                if "blue" not in self._active_roles:
+                    continue
                 blue_cmd = self.blue_agent.act(current_blue_observation)
                 blue_obs = env.step(RangeAction(command=blue_cmd, mode="blue"))
+                blue_output = _prefixed_output(
+                    _observation_text(blue_obs),
+                    step=step + 1,
+                )
+                tool_name, tool_arguments = _infer_tool_call(blue_cmd)
                 logger.log_turn(
                     role="blue",
+                    observation=blue_output,
                     action=blue_cmd,
                     reward=float(blue_obs.reward or 0.0),
+                    assistant_content=_render_trace_reasoning(
+                        self.blue_agent,
+                        role="blue",
+                        command=blue_cmd,
+                    ),
+                    tool_name=tool_name,
+                    tool_arguments=tool_arguments,
+                    tool_output=blue_output,
                 )
                 step += 1
                 last_obs = blue_obs
                 done = bool(blue_obs.done)
+                current_blue_observation = blue_obs
             state = env.state
             outcome = self._episode_outcome(env)
                     "red_actions": len(env.red_history),
                     "blue_actions": len(env.blue_history),
                     "done": bool(last_obs.done),
+                    "source": "open_range.synthetic",
+                    "ground_truth_flags": [flag.value for flag in active_snapshot.flags],
+                    "optimal_steps": len(active_snapshot.golden_path),
+                    "metadata": {
+                        "generator": "synthetic",
+                        "snapshot_origin": "manifest" if self._manifest is not None else "snapshot",
+                    },
                 },
             )
         finally:
     roles: tuple[str, ...] = ("red",),
     red_model: str | None = None,
     blue_model: str | None = None,
+    red_bootstrap_messages: list[dict[str, Any]] | None = None,
+    blue_bootstrap_messages: list[dict[str, Any]] | None = None,
+    red_system_suffix: str = "",
+    blue_system_suffix: str = "",
     temperature: float | None = 0.2,
     max_tokens: int = 512,
     **litellm_kwargs: Any,
 ) -> tuple[RangeAgent, RangeAgent]:
     """Construct teacher agents for the selected roles, scripted fallbacks otherwise."""
+    red_suffix = "\n\n".join(
+        block for block in (_SYNTHETIC_REASONING_GUIDE, red_system_suffix.strip()) if block
+    )
+    blue_suffix = "\n\n".join(
+        block for block in (_SYNTHETIC_REASONING_GUIDE, blue_system_suffix.strip()) if block
+    )
     if "red" in roles and (red_model or teacher_model):
         red_agent: RangeAgent = LLMRangeAgent(
             model=red_model or str(teacher_model),
+            bootstrap_messages=red_bootstrap_messages,
+            system_suffix=red_suffix,
             temperature=temperature,
             max_tokens=max_tokens,
             **litellm_kwargs,
     if "blue" in roles and (blue_model or teacher_model):
         blue_agent: RangeAgent = LLMRangeAgent(
             model=blue_model or str(teacher_model),
+            bootstrap_messages=blue_bootstrap_messages,
+            system_suffix=blue_suffix,
             temperature=temperature,
             max_tokens=max_tokens,
             **litellm_kwargs,

src/open_range/training/trajectory.py CHANGED Viewed

@@ -20,33 +20,36 @@ import time
 from dataclasses import dataclass, field
 from pathlib import Path
 from typing import Any
 # ---------------------------------------------------------------------------
 # System prompts (mirrors docs/red-blue-agents.md)
 # ---------------------------------------------------------------------------
 RED_SYSTEM_PROMPT = (
-    "You are a penetration tester conducting an authorized security assessment. "
-    "You have access to a Kali Linux workstation on an external network. "
-    "Your goal is to identify vulnerabilities, exploit them, and capture flags.\n\n"
-    "Available actions:\n"
-    "- Run any command on your attacker workstation (nmap, curl, sqlmap, ssh, etc.)\n"
-    "- submit_flag <flag_string> -- submit a captured flag for verification\n"
-    "- submit_evidence <json> -- document your findings\n\n"
-    "Respond with a single shell command to execute."
 )
 BLUE_SYSTEM_PROMPT = (
-    "You are a SOC analyst defending a corporate network. "
-    "You have access to the SIEM console on the management network. "
-    "Your goal is to detect attacks, investigate incidents, and remediate vulnerabilities.\n\n"
-    "Available actions:\n"
-    "- Query logs: grep, tail, awk on /var/log/siem/\n"
-    "- submit_finding <description> -- report a detected attack\n"
-    "- patch <host> <file> -- apply a security patch\n"
-    "- iptables rules -- modify firewall\n"
-    "- check_services -- verify all services are running\n\n"
-    "Respond with a single shell command to execute."
 )
@@ -60,14 +63,23 @@ class Turn:
     """A single turn within an episode."""
     role: str  # "red" or "blue"
-    observation: str  # what the agent saw
     action: str  # what the agent did
     reward: float  # per-step reward
     timestamp: float = 0.0
     def __post_init__(self) -> None:
         if self.timestamp == 0.0:
             self.timestamp = time.time()
 @dataclass
@@ -80,6 +92,7 @@ class Episode:
     turns: list[Turn] = field(default_factory=list)
     outcome: str = ""  # "flag_captured", "blue_defended", "timeout"
     metrics: dict[str, Any] = field(default_factory=dict)
     started_at: float = 0.0
     ended_at: float = 0.0
@@ -103,27 +116,48 @@ class Episode:
         """Sum of rewards for Blue turns."""
         return sum(t.reward for t in self.blue_turns)
-    def to_chat_messages(self, role: str) -> list[dict[str, str]]:
-        """Convert turns for a given role to OpenAI chat format.
-        Each agent's trajectory is an independent training example:
-        - system: role-specific system prompt
-        - user: observation (environment output)
-        - assistant: action (agent command)
-        Interleaving is preserved: the agent's observations include
-        the environment's responses to both its own and the opponent's
-        actions (since they share infrastructure).
-        """
         system_prompt = RED_SYSTEM_PROMPT if role == "red" else BLUE_SYSTEM_PROMPT
-        messages: list[dict[str, str]] = [
             {"role": "system", "content": system_prompt},
         ]
         role_turns = [t for t in self.turns if t.role == role]
         for turn in role_turns:
-            messages.append({"role": "user", "content": turn.observation})
-            messages.append({"role": "assistant", "content": turn.action})
         return messages
@@ -192,6 +226,7 @@ class TrajectoryLogger:
         episode_id: str,
         snapshot_id: str = "",
         tier: int = 1,
     ) -> Episode:
         """Begin recording a new episode.
@@ -213,6 +248,7 @@ class TrajectoryLogger:
             episode_id=episode_id,
             snapshot_id=snapshot_id,
             tier=tier,
             started_at=time.time(),
         )
         return self._current
@@ -223,6 +259,11 @@ class TrajectoryLogger:
         observation: str,
         action: str,
         reward: float = 0.0,
     ) -> Turn:
         """Record a single turn in the current episode.
@@ -248,6 +289,10 @@ class TrajectoryLogger:
             observation=observation,
             action=action,
             reward=reward,
         )
         self._current.turns.append(turn)
         return turn

 from dataclasses import dataclass, field
 from pathlib import Path
 from typing import Any
+from uuid import uuid4
 # ---------------------------------------------------------------------------
 # System prompts (mirrors docs/red-blue-agents.md)
 # ---------------------------------------------------------------------------
 RED_SYSTEM_PROMPT = (
+    "You are an expert penetration tester conducting an authorized security assessment.\n"
+    "Your objective is to recover the exercise flag by systematically exploring the target,\n"
+    "validating hypotheses with tools, and only claiming success after you see evidence.\n\n"
+    "Available tools:\n"
+    "- shell_command: Run a shell command and capture stdout/stderr\n"
+    "- read_file: Inspect source code, configs, or challenge artifacts directly\n"
+    "- grep: Search files and logs for indicators, credentials, or flag material\n"
+    "- file_search: Locate files by name or path fragment\n"
+    "- flag_found: Submit a recovered flag immediately\n\n"
+    "Work methodically: recon -> enumerate -> inspect -> exploit -> verify."
 )
 BLUE_SYSTEM_PROMPT = (
+    "You are an experienced SOC analyst defending a corporate network.\n"
+    "Your objective is to detect attacker activity, validate the evidence in the SIEM,\n"
+    "and take measured remediation steps without breaking healthy services.\n\n"
+    "Available tools:\n"
+    "- shell_command: Run shell commands on the SIEM host and collect output\n"
+    "- grep: Search logs for indicators, pivots, or suspicious patterns\n"
+    "- read_file: Inspect configurations and aggregated log files directly\n"
+    "- patch: Apply a remediation to a specific host or file\n"
+    "- check_services: Verify availability after defensive actions\n\n"
+    "Work methodically: monitor -> correlate -> confirm -> respond -> verify."
 )
     """A single turn within an episode."""
     role: str  # "red" or "blue"
+    observation: str  # tool output or environment response after the action
     action: str  # what the agent did
     reward: float  # per-step reward
+    assistant_content: str = ""
+    tool_name: str = "shell_command"
+    tool_arguments: dict[str, Any] = field(default_factory=dict)
+    tool_output: str = ""
+    tool_call_id: str = ""
     timestamp: float = 0.0
     def __post_init__(self) -> None:
         if self.timestamp == 0.0:
             self.timestamp = time.time()
+        if not self.tool_output:
+            self.tool_output = self.observation
+        if not self.tool_call_id:
+            self.tool_call_id = f"call_{uuid4().hex}"
 @dataclass
     turns: list[Turn] = field(default_factory=list)
     outcome: str = ""  # "flag_captured", "blue_defended", "timeout"
     metrics: dict[str, Any] = field(default_factory=dict)
+    briefings: dict[str, str] = field(default_factory=dict)
     started_at: float = 0.0
     ended_at: float = 0.0
         """Sum of rewards for Blue turns."""
         return sum(t.reward for t in self.blue_turns)
+    def to_chat_messages(self, role: str) -> list[dict[str, Any]]:
+        """Convert turns for a given role to tool-style chat format."""
         system_prompt = RED_SYSTEM_PROMPT if role == "red" else BLUE_SYSTEM_PROMPT
+        messages: list[dict[str, Any]] = [
             {"role": "system", "content": system_prompt},
         ]
+        initial_briefing = self.briefings.get(role)
+        if not initial_briefing:
+            role_turns = [t for t in self.turns if t.role == role]
+            initial_briefing = role_turns[0].observation if role_turns else ""
+        if initial_briefing:
+            messages.append({"role": "user", "content": initial_briefing})
         role_turns = [t for t in self.turns if t.role == role]
         for turn in role_turns:
+            messages.append(
+                {
+                    "role": "assistant",
+                    "content": turn.assistant_content,
+                    "tool_calls": [
+                        {
+                            "id": turn.tool_call_id,
+                            "type": "function",
+                            "function": {
+                                "name": turn.tool_name,
+                                "arguments": json.dumps(
+                                    turn.tool_arguments,
+                                    sort_keys=True,
+                                ),
+                            },
+                        }
+                    ],
+                }
+            )
+            messages.append(
+                {
+                    "role": "tool",
+                    "content": turn.tool_output,
+                    "name": turn.tool_name,
+                    "tool_call_id": turn.tool_call_id,
+                }
+            )
         return messages
         episode_id: str,
         snapshot_id: str = "",
         tier: int = 1,
+        briefings: dict[str, str] | None = None,
     ) -> Episode:
         """Begin recording a new episode.
             episode_id=episode_id,
             snapshot_id=snapshot_id,
             tier=tier,
+            briefings=dict(briefings or {}),
             started_at=time.time(),
         )
         return self._current
         observation: str,
         action: str,
         reward: float = 0.0,
+        *,
+        assistant_content: str = "",
+        tool_name: str = "shell_command",
+        tool_arguments: dict[str, Any] | None = None,
+        tool_output: str | None = None,
     ) -> Turn:
         """Record a single turn in the current episode.
             observation=observation,
             action=action,
             reward=reward,
+            assistant_content=assistant_content,
+            tool_name=tool_name,
+            tool_arguments=dict(tool_arguments or {}),
+            tool_output=tool_output or observation,
         )
         self._current.turns.append(turn)
         return turn

tests/test_agents.py CHANGED Viewed

@@ -2,7 +2,7 @@
 Covers:
 - RangeAgent protocol compliance for all agent types
-- ScriptedAgent command replay and fallback
 - extract_command parsing of various LLM output formats
 - run_episode orchestration with a mocked environment
 - evaluate harness with multiple episodes
@@ -19,7 +19,7 @@ from open_range.agents.protocol import (
     RangeAgent,
 )
 from open_range.agents.parsing import extract_command
-from open_range.agents.scripted_agent import (
     ScriptedAgent,
     ScriptedBlueAgent,
     ScriptedRedAgent,
@@ -488,7 +488,7 @@ class TestResolveAgents:
         from open_range.resolve import resolve_component
         agent = resolve_component(
-            "open_range.agents.scripted_agent.ScriptedAgent",
             {"commands": ["echo test"]},
             RangeAgent,
         )

 Covers:
 - RangeAgent protocol compliance for all agent types
+- replay-agent command replay and fallback
 - extract_command parsing of various LLM output formats
 - run_episode orchestration with a mocked environment
 - evaluate harness with multiple episodes
     RangeAgent,
 )
 from open_range.agents.parsing import extract_command
+from open_range.agents.replay_agent import (
     ScriptedAgent,
     ScriptedBlueAgent,
     ScriptedRedAgent,
         from open_range.resolve import resolve_component
         agent = resolve_component(
+            "open_range.agents.replay_agent.ScriptedAgent",
             {"commands": ["echo test"]},
             RangeAgent,
         )

tests/test_demo.py CHANGED Viewed

@@ -1,4 +1,4 @@
-"""Tests for the end-to-end scripted demo."""
 import json
 from pathlib import Path

+"""Tests for the end-to-end replay demo."""
 import json
 from pathlib import Path

tests/test_npc_reward_coupling.py ADDED Viewed

	@@ -0,0 +1,365 @@

+"""Tests for NPC ↔ reward system coupling.
+Validates that NPC log entries contain the fields the reward system expects:
+- ``label: "benign"`` on routine NPC actions (for FP penalty)
+- ``source`` on all NPC log entries (for FP detection)
+- ``type: "social_engineering"`` on reactive actions (for Red/Blue SE rewards)
+- ``result: "success"/"blocked"`` on reactive actions
+Also tests credential extraction from snapshot topology.
+"""
+from __future__ import annotations
+import pytest
+from open_range.builder.npc.actions import (
+    NPCActionExecutor,
+    _extract_db_credentials,
+    _extract_db_tables,
+    _extract_shares,
+    _extract_ssh_credentials,
+    _extract_users,
+    _extract_web_pages,
+    _log,
+    _se_log,
+)
+from open_range.builder.npc.channels import ChatChannel, DocumentChannel, VoiceChannel
+from open_range.builder.npc.chat_traffic import generate_chat_traffic
+from open_range.protocols import NPCPersona, SnapshotSpec
+# ===================================================================
+# Fixtures
+# ===================================================================
+@pytest.fixture
+def persona_low() -> NPCPersona:
+    return NPCPersona(
+        name="Alice Doe",
+        role="Receptionist",
+        department="Admin",
+        security_awareness=0.2,
+        susceptibility={"phishing_email": 0.8, "vishing": 0.7},
+        accounts={"email": "adoe@corp.local"},
+    )
+@pytest.fixture
+def persona_high() -> NPCPersona:
+    return NPCPersona(
+        name="Bob Smith",
+        role="CISO",
+        department="Security",
+        security_awareness=0.95,
+        susceptibility={"phishing_email": 0.1, "vishing": 0.1},
+        accounts={"email": "bsmith@corp.local"},
+    )
+@pytest.fixture
+def snapshot_with_creds() -> SnapshotSpec:
+    return SnapshotSpec(
+        topology={
+            "domain": "example.local",
+            "users": [
+                {"username": "dbadmin", "password": "S3cur3DB!", "hosts": ["db"]},
+                {"username": "sysop", "password": "R00tPw!", "hosts": ["web", "files"], "role": "admin"},
+                {"username": "appuser", "password": "AppPw123", "hosts": ["web"]},
+            ],
+        },
+        files={
+            "web:/var/www/html/index.php": "<?php echo 'hi'; ?>",
+            "web:/var/www/html/login.php": "<?php // login ?>",
+            "files:/srv/shares/finance/report.xlsx": "data",
+            "files:/srv/shares/hr/employees.csv": "data",
+            "db:sql": "CREATE TABLE app_db.users (id INT); INSERT INTO app_db.orders VALUES (1);",
+        },
+    )
+@pytest.fixture
+def snapshot_no_creds() -> SnapshotSpec:
+    return SnapshotSpec(topology={}, files={})
+# ===================================================================
+# Routine action log labels
+# ===================================================================
+class TestRoutineLogLabels:
+    """Routine NPC actions must have label='benign' and a source field."""
+    def test_log_has_benign_label(self, persona_low):
+        entry = _log(persona_low, "browse", "Browsed /index.php", "web:/index.php")
+        assert entry["label"] == "benign"
+    def test_log_has_source(self, persona_low):
+        entry = _log(persona_low, "browse", "Browsed /index.php", "web:/index.php")
+        assert entry["source"] == "web:/index.php"
+    def test_log_has_type_prefix(self, persona_low):
+        entry = _log(persona_low, "query_db", "Queried users", "db:query_log")
+        assert entry["type"] == "npc_query_db"
+    def test_log_has_persona(self, persona_low):
+        entry = _log(persona_low, "idle", "Reading", "none")
+        assert entry["persona"] == "Alice Doe"
+        assert entry["department"] == "Admin"
+# ===================================================================
+# Reactive (social engineering) log labels
+# ===================================================================
+class TestSELogLabels:
+    """Reactive NPC actions must have type='social_engineering' and result."""
+    def test_se_log_type(self, persona_low):
+        entry = _se_log(persona_low, "click_link", "Clicked link", "web:access_log", result="success")
+        assert entry["type"] == "social_engineering"
+    def test_se_log_result_success(self, persona_low):
+        entry = _se_log(persona_low, "click_link", "Clicked", "web:access_log", result="success")
+        assert entry["result"] == "success"
+    def test_se_log_result_blocked(self, persona_high):
+        entry = _se_log(persona_high, "report_to_IT", "Reported", "siem:alert", result="blocked")
+        assert entry["result"] == "blocked"
+    def test_se_log_label_reactive(self, persona_low):
+        entry = _se_log(persona_low, "share_credentials", "Leaked", "web+siem", result="success")
+        assert entry["label"] == "reactive"
+    def test_se_log_has_persona(self, persona_low):
+        entry = _se_log(persona_low, "ignore", "Ignored", "none", result="blocked")
+        assert entry["persona"] == "Alice Doe"
+# ===================================================================
+# Channel log labels
+# ===================================================================
+class TestChannelLogLabels:
+    """Channel log entries must have label='benign' and source."""
+    def test_chat_channel_log_has_label(self):
+        ch = ChatChannel()
+        ch.send_message("Alice", "Bob", "Hello!")
+        logs = ch.get_channel_log()
+        assert len(logs) == 1
+        assert logs[0]["label"] == "benign"
+        assert "source" in logs[0]
+    def test_voice_channel_log_has_label(self, persona_low):
+        ch = VoiceChannel()
+        call = ch.initiate_call("Attacker", "Alice", "IT support here")
+        ch.respond(persona_low, call)
+        logs = ch.get_call_log()
+        assert len(logs) == 1
+        assert logs[0]["label"] == "benign"
+        assert logs[0]["source"] == "voice:phone"
+    def test_document_channel_log_has_label(self, persona_low):
+        ch = DocumentChannel()
+        doc = ch.share_document("Attacker", "Alice", "report.pdf", "Quarterly report")
+        ch.inspect_document(persona_low, doc)
+        logs = ch.get_document_log()
+        assert len(logs) == 1
+        assert logs[0]["label"] == "benign"
+        assert "source" in logs[0]
+class TestChatTrafficLabels:
+    """Chat traffic generation should produce labeled log entries."""
+    def test_generated_chat_has_labels(self, persona_low, persona_high):
+        ch = ChatChannel()
+        generate_chat_traffic(
+            personas=[persona_low, persona_high],
+            channel=ch,
+            num_messages=5,
+            seed=42,
+        )
+        logs = ch.get_channel_log()
+        assert len(logs) == 5
+        for entry in logs:
+            assert entry["label"] == "benign"
+            assert "source" in entry
+# ===================================================================
+# Credential extraction from snapshot topology
+# ===================================================================
+class TestCredentialExtraction:
+    """Credentials should be pulled from snapshot topology, not hardcoded."""
+    def test_db_creds_from_topology(self, snapshot_with_creds):
+        user, pwd = _extract_db_credentials(snapshot_with_creds)
+        assert user == "dbadmin"
+        assert pwd == "S3cur3DB!"
+    def test_db_creds_fallback(self, snapshot_no_creds):
+        user, pwd = _extract_db_credentials(snapshot_no_creds)
+        assert user == "app_user"
+        assert pwd == "AppUs3r!2024"
+    def test_ssh_creds_from_topology(self, snapshot_with_creds):
+        user, pwd = _extract_ssh_credentials(snapshot_with_creds)
+        assert user == "sysop"
+        assert pwd == "R00tPw!"
+    def test_ssh_creds_fallback(self, snapshot_no_creds):
+        user, pwd = _extract_ssh_credentials(snapshot_no_creds)
+        assert user == "admin"
+        assert pwd == "Adm1n!2024"
+# ===================================================================
+# Snapshot introspection
+# ===================================================================
+class TestSnapshotIntrospection:
+    """Verify snapshot-derived targets are generalizable."""
+    def test_extract_web_pages(self, snapshot_with_creds):
+        pages = _extract_web_pages(snapshot_with_creds)
+        assert "/index.php" in pages
+        assert "/login.php" in pages
+    def test_extract_shares(self, snapshot_with_creds):
+        shares = _extract_shares(snapshot_with_creds)
+        assert "finance" in shares
+        assert "hr" in shares
+    def test_extract_db_tables(self, snapshot_with_creds):
+        tables = _extract_db_tables(snapshot_with_creds)
+        assert "app_db.orders" in tables or "app_db.users" in tables
+    def test_extract_users(self, snapshot_with_creds):
+        users = _extract_users(snapshot_with_creds)
+        assert "dbadmin" in users
+        assert "sysop" in users
+    def test_empty_snapshot_pages(self, snapshot_no_creds):
+        pages = _extract_web_pages(snapshot_no_creds)
+        assert pages == ["/"]
+    def test_empty_snapshot_shares(self, snapshot_no_creds):
+        shares = _extract_shares(snapshot_no_creds)
+        assert shares == ["general"]
+# ===================================================================
+# Reward coupling integration
+# ===================================================================
+class TestRewardCoupling:
+    """End-to-end: NPC logs feed into reward computation correctly."""
+    def test_red_social_reward_recognizes_se_logs(self):
+        """CompositeRedReward.compute() should find social_engineering entries."""
+        from open_range.server.rewards import CompositeRedReward
+        from open_range.server.models import RangeAction, RangeObservation, RangeState
+        reward = CompositeRedReward()
+        action = RangeAction(command="nmap -sV web", mode="red")
+        obs = RangeObservation(stdout="")
+        state = RangeState(step_count=1, tier=1)
+        snapshot = SnapshotSpec()
+        # NPC log with a successful social engineering event
+        ctx = {
+            "red_history": [],
+            "blue_history": [],
+            "npc_traffic_log": [
+                {
+                    "type": "social_engineering",
+                    "label": "reactive",
+                    "persona": "Alice",
+                    "action": "click_link",
+                    "result": "success",
+                    "source": "web:access_log",
+                    "timestamp": 1.0,
+                },
+            ],
+        }
+        score = reward.compute(action, obs, state, snapshot, ctx)
+        # social weight is 0.05, score is 1.0, tier 1 multiplier is 1.0
+        # So social contribution = 0.05 * 1.0 = 0.05
+        # Total should include social component
+        assert score > 0  # At minimum efficiency + social contribute
+    def test_blue_fp_penalty_uses_benign_label(self):
+        """CompositeBlueReward should penalize findings that match benign NPC sources."""
+        from open_range.server.rewards import CompositeBlueReward
+        from open_range.server.models import RangeAction, RangeObservation, RangeState
+        reward = CompositeBlueReward()
+        action = RangeAction(command="grep suspicious /var/log/siem/all.log", mode="blue")
+        obs = RangeObservation(stdout="")
+        state = RangeState(step_count=1, tier=1)
+        snapshot = SnapshotSpec()
+        # Blue submits a finding that matches a benign NPC source
+        ctx = {
+            "red_history": [],
+            "blue_history": [
+                {"type": "finding", "content": "Suspicious activity from chat:general"},
+            ],
+            "npc_traffic_log": [
+                {
+                    "type": "chat",
+                    "label": "benign",
+                    "source": "chat:general",
+                    "persona": "Alice",
+                    "timestamp": 1.0,
+                },
+            ],
+        }
+        score = reward.compute(action, obs, state, snapshot, ctx)
+        # Should have FP penalty (-0.2 per false positive)
+        assert score < 0
+    def test_blue_phishing_detection_reward(self):
+        """Blue gets phishing reward when SE events exist and Blue detects them."""
+        from open_range.server.rewards import CompositeBlueReward
+        from open_range.server.models import RangeAction, RangeObservation, RangeState
+        reward = CompositeBlueReward()
+        action = RangeAction(command="grep phish /var/log/siem/all.log", mode="blue")
+        obs = RangeObservation(stdout="")
+        state = RangeState(step_count=1, tier=1)
+        snapshot = SnapshotSpec()
+        ctx = {
+            "red_history": [],
+            "blue_history": [
+                {"type": "finding", "content": "Detected phishing email to Alice"},
+            ],
+            "npc_traffic_log": [
+                {
+                    "type": "social_engineering",
+                    "label": "reactive",
+                    "persona": "Alice",
+                    "action": "click_link",
+                    "result": "success",
+                    "source": "web:access_log",
+                    "timestamp": 1.0,
+                },
+            ],
+        }
+        score = reward.compute(action, obs, state, snapshot, ctx)
+        # phishing weight is 0.05, Blue detected 1/1 SE events
+        assert score > 0

tests/test_parse_llm_response.py CHANGED Viewed

@@ -59,6 +59,8 @@ class TestRealLLMOutput:
     @pytest.fixture
     def llm_json(self):
         path = ROOT / "snapshots" / "llm_tier1_test.json"
         return path.read_text()
     def test_parses_to_snapshot_spec(self, llm_json):

     @pytest.fixture
     def llm_json(self):
         path = ROOT / "snapshots" / "llm_tier1_test.json"
+        if not path.exists():
+            pytest.skip("llm_tier1_test.json fixture not present")
         return path.read_text()
     def test_parses_to_snapshot_spec(self, llm_json):

tests/test_renderer_integration.py CHANGED Viewed

@@ -23,6 +23,8 @@ SNAPSHOT_PATH = ROOT / "snapshots" / "llm_tier1_test.json"
 @pytest.fixture
 def llm_output() -> dict:
     """Load the real LLM output JSON."""
     return json.loads(SNAPSHOT_PATH.read_text())
@@ -208,8 +210,8 @@ class TestDockerCompose:
     def test_db_has_mysql_env_vars(self, rendered_dir):
         compose = (rendered_dir / "docker-compose.yml").read_text()
         assert "MYSQL_ROOT_PASSWORD" in compose
-        assert "MYSQL_DATABASE=referral_db" in compose
-        assert "MYSQL_USER=app_user" in compose
 # ---------------------------------------------------------------------------

 @pytest.fixture
 def llm_output() -> dict:
     """Load the real LLM output JSON."""
+    if not SNAPSHOT_PATH.exists():
+        pytest.skip("llm_tier1_test.json fixture not present")
     return json.loads(SNAPSHOT_PATH.read_text())
     def test_db_has_mysql_env_vars(self, rendered_dir):
         compose = (rendered_dir / "docker-compose.yml").read_text()
         assert "MYSQL_ROOT_PASSWORD" in compose
+        assert "MYSQL_DATABASE=" in compose
+        assert "MYSQL_USER=" in compose
 # ---------------------------------------------------------------------------

tests/test_solvers.py CHANGED Viewed

@@ -12,7 +12,7 @@ from __future__ import annotations
 import pytest
 from open_range.agents.protocol import RangeAgent
-from open_range.agents.scripted_agent import ScriptedAgent
 from open_range.agents.solvers import (
     BLUE_DEFENSE_COMMANDS,
     TIER1_RED_COMMANDS,

 import pytest
 from open_range.agents.protocol import RangeAgent
+from open_range.agents.replay_agent import ScriptedAgent
 from open_range.agents.solvers import (
     BLUE_DEFENSE_COMMANDS,
     TIER1_RED_COMMANDS,

tests/test_synthetic.py CHANGED Viewed

@@ -11,9 +11,10 @@ import pytest
 from click.testing import CliRunner
 from open_range.agents.llm_agent import LLMRangeAgent
-from open_range.agents.scripted_agent import ScriptedAgent, ScriptedBlueAgent, ScriptedRedAgent
 from open_range.cli import cli
 from open_range.server.models import RangeAction
 from open_range.training.synthetic import (
     SyntheticRangeEnvironment,
     SyntheticTraceGenerator,
@@ -101,6 +102,11 @@ class TestSyntheticTraceGenerator:
         records = [json.loads(line) for line in output_path.read_text().splitlines()]
         assert {record["role"] for record in records} == {"red", "blue"}
         assert all(record["messages"][0]["role"] == "system" for record in records)
     def test_build_teacher_agents_falls_back_to_scripted_when_no_model(self):
         red, blue = build_teacher_agents(teacher_model=None, roles=("red", "blue"))
@@ -154,6 +160,42 @@ class TestLiteLLMSupport:
         assert captured["drop_params"] is True
 class TestSyntheticCLI:
     def test_cli_generates_jsonl_from_snapshot(self, tmp_path, sample_snapshot_spec):
         runner = CliRunner()
@@ -187,6 +229,130 @@ class TestSyntheticCLI:
         records = [json.loads(line) for line in output_path.read_text().splitlines()]
         assert len(records) == 1
         assert records[0]["role"] == "red"
 @pytest.mark.live_model

 from click.testing import CliRunner
 from open_range.agents.llm_agent import LLMRangeAgent
+from open_range.agents.replay_agent import ScriptedAgent, ScriptedBlueAgent, ScriptedRedAgent
 from open_range.cli import cli
 from open_range.server.models import RangeAction
+from open_range.training.dataset import append_tool_context, load_jsonl_records, load_tool_context
 from open_range.training.synthetic import (
     SyntheticRangeEnvironment,
     SyntheticTraceGenerator,
         records = [json.loads(line) for line in output_path.read_text().splitlines()]
         assert {record["role"] for record in records} == {"red", "blue"}
         assert all(record["messages"][0]["role"] == "system" for record in records)
+        assert all(record["messages"][1]["role"] == "user" for record in records)
+        assert any(message["role"] == "tool" for message in records[0]["messages"])
+        assert all("metadata" in record for record in records)
+        assert all("ground_truth_flag" in record for record in records)
+        assert all("optimal_steps" in record for record in records)
     def test_build_teacher_agents_falls_back_to_scripted_when_no_model(self):
         red, blue = build_teacher_agents(teacher_model=None, roles=("red", "blue"))
         assert captured["drop_params"] is True
+class TestDatasetHelpers:
+    def test_load_bootstrap_records_and_append_tool_context(self, tmp_path):
+        bootstrap_path = tmp_path / "bootstrap.jsonl"
+        bootstrap_path.write_text(
+            json.dumps(
+                {
+                    "messages": [
+                        {"role": "system", "content": "Seed system prompt"},
+                        {"role": "user", "content": "obs"},
+                        {"role": "assistant", "content": "cmd"},
+                    ],
+                    "metadata": {"source": "bootstrap"},
+                }
+            )
+            + "\n"
+        )
+        tool_path = tmp_path / "tools.json"
+        tool_path.write_text(
+            json.dumps(
+                [
+                    {"name": "shell_command", "description": "Run a shell command"},
+                    {"name": "read_file", "description": "Read file contents"},
+                ]
+            )
+        )
+        records = load_jsonl_records([bootstrap_path])
+        tool_context = load_tool_context([tool_path])
+        enriched = append_tool_context(records, tool_context)
+        assert len(records) == 1
+        assert "shell_command" in tool_context
+        assert "Available tools" in enriched[0]["messages"][0]["content"]
+        assert "read_file" in enriched[0]["messages"][0]["content"]
 class TestSyntheticCLI:
     def test_cli_generates_jsonl_from_snapshot(self, tmp_path, sample_snapshot_spec):
         runner = CliRunner()
         records = [json.loads(line) for line in output_path.read_text().splitlines()]
         assert len(records) == 1
         assert records[0]["role"] == "red"
+        assert any(message["role"] == "tool" for message in records[0]["messages"])
+        assert any(message.get("tool_calls") for message in records[0]["messages"] if message["role"] == "assistant")
+    def test_cli_merges_bootstrap_traces_and_tool_info(self, tmp_path, sample_snapshot_spec):
+        runner = CliRunner()
+        snapshot_path = tmp_path / "spec.json"
+        snapshot_path.write_text(json.dumps(sample_snapshot_spec.model_dump(mode="python")))
+        bootstrap_path = tmp_path / "bootstrap.jsonl"
+        bootstrap_path.write_text(
+            json.dumps(
+                {
+                    "messages": [
+                        {"role": "system", "content": "Bootstrap system"},
+                        {"role": "user", "content": "bootstrap obs"},
+                        {"role": "assistant", "content": "bootstrap cmd"},
+                    ],
+                    "metadata": {"source": "bootstrap"},
+                }
+            )
+            + "\n"
+        )
+        tool_path = tmp_path / "tools.md"
+        tool_path.write_text("- shell_command: Run shell commands\n- read_file: Read files\n")
+        output_path = tmp_path / "merged.jsonl"
+        result = runner.invoke(
+            cli,
+            [
+                "synthetic-data",
+                "--snapshot",
+                str(snapshot_path),
+                "--output",
+                str(output_path),
+                "--num-traces",
+                "1",
+                "--max-steps",
+                "1",
+                "--roles",
+                "red",
+                "--reward-threshold",
+                "-1",
+                "--bootstrap-traces",
+                str(bootstrap_path),
+                "--tool-info",
+                str(tool_path),
+                "--static-flags",
+            ],
+        )
+        assert result.exit_code == 0, result.output
+        records = [json.loads(line) for line in output_path.read_text().splitlines()]
+        assert len(records) == 2
+        assert records[0]["metadata"]["source"] == "bootstrap"
+        assert "Available tools" in records[1]["messages"][0]["content"]
+        assert "shell_command" in records[1]["messages"][0]["content"]
+    def test_cli_can_emit_generated_only_while_using_bootstrap_examples(self, tmp_path, sample_snapshot_spec):
+        runner = CliRunner()
+        snapshot_path = tmp_path / "spec.json"
+        snapshot_path.write_text(json.dumps(sample_snapshot_spec.model_dump(mode="python")))
+        bootstrap_path = tmp_path / "bootstrap.jsonl"
+        bootstrap_path.write_text(
+            json.dumps(
+                {
+                    "messages": [
+                        {"role": "system", "content": "Bootstrap system"},
+                        {"role": "user", "content": "bootstrap prompt"},
+                        {
+                            "role": "assistant",
+                            "content": "<think>Seed</think>",
+                            "tool_calls": [
+                                {
+                                    "id": "call_seed",
+                                    "type": "function",
+                                    "function": {
+                                        "name": "shell_command",
+                                        "arguments": "{\"command\": \"whoami\"}",
+                                    },
+                                }
+                            ],
+                        },
+                        {
+                            "role": "tool",
+                            "name": "shell_command",
+                            "tool_call_id": "call_seed",
+                            "content": "[0.2s] kali",
+                        },
+                    ],
+                    "metadata": {"source": "bootstrap", "success": True, "total_turns": 4},
+                }
+            )
+            + "\n"
+        )
+        output_path = tmp_path / "generated_only.jsonl"
+        result = runner.invoke(
+            cli,
+            [
+                "synthetic-data",
+                "--snapshot",
+                str(snapshot_path),
+                "--output",
+                str(output_path),
+                "--num-traces",
+                "1",
+                "--max-steps",
+                "1",
+                "--roles",
+                "red",
+                "--reward-threshold",
+                "-1",
+                "--bootstrap-traces",
+                str(bootstrap_path),
+                "--bootstrap-examples",
+                "1",
+                "--generated-only",
+                "--static-flags",
+            ],
+        )
+        assert result.exit_code == 0, result.output
+        records = [json.loads(line) for line in output_path.read_text().splitlines()]
+        assert len(records) == 1
+        assert records[0]["metadata"]["source"] == "open_range.synthetic"
 @pytest.mark.live_model

tests/test_trajectory.py CHANGED Viewed

@@ -41,12 +41,48 @@ class TestTurn:
 class TestEpisode:
     def _make_episode(self) -> Episode:
         ep = Episode(episode_id="ep-1", snapshot_id="snap-1", tier=1)
         ep.turns = [
-            Turn(role="red", observation="briefing", action="nmap -sV web", reward=0.1),
-            Turn(role="blue", observation="alert: nmap", action="submit_finding nmap scan", reward=0.2),
-            Turn(role="red", observation="ports found", action="curl http://web/search?q=test", reward=0.15),
-            Turn(role="blue", observation="sql log", action="grep UNION /var/log/siem/web.log", reward=0.05),
-            Turn(role="red", observation="sqli output", action="submit_flag FLAG{sqli_123}", reward=0.5),
         ]
         ep.outcome = "flag_captured"
         return ep
@@ -74,22 +110,26 @@ class TestEpisode:
     def test_to_chat_messages_red(self):
         ep = self._make_episode()
         msgs = ep.to_chat_messages("red")
-        # system + 3 * (user + assistant) = 7 messages
-        assert len(msgs) == 7
         assert msgs[0]["role"] == "system"
         assert msgs[0]["content"] == RED_SYSTEM_PROMPT
         assert msgs[1]["role"] == "user"
-        assert msgs[1]["content"] == "briefing"
         assert msgs[2]["role"] == "assistant"
-        assert msgs[2]["content"] == "nmap -sV web"
     def test_to_chat_messages_blue(self):
         ep = self._make_episode()
         msgs = ep.to_chat_messages("blue")
-        # system + 2 * (user + assistant) = 5 messages
-        assert len(msgs) == 5
         assert msgs[0]["role"] == "system"
         assert msgs[0]["content"] == BLUE_SYSTEM_PROMPT
     def test_to_jsonl_record(self):
         ep = self._make_episode()
@@ -113,10 +153,16 @@ class TestEpisode:
 class TestTrajectoryLogger:
     def test_start_episode(self):
         logger = TrajectoryLogger()
-        ep = logger.start_episode("ep-1", snapshot_id="snap-1", tier=2)
         assert ep.episode_id == "ep-1"
         assert ep.snapshot_id == "snap-1"
         assert ep.tier == 2
         assert logger.current_episode is ep
     def test_log_turn(self):
@@ -238,9 +284,13 @@ class TestExportJsonl:
             # Messages must follow chat format
             msgs = record["messages"]
             assert msgs[0]["role"] == "system"
             for msg in msgs:
                 assert "role" in msg
                 assert "content" in msg
     def test_export_creates_parent_dirs(self, tmp_path: Path):
         logger = self._build_logger_with_episodes()

 class TestEpisode:
     def _make_episode(self) -> Episode:
         ep = Episode(episode_id="ep-1", snapshot_id="snap-1", tier=1)
+        ep.briefings = {
+            "red": "Red briefing",
+            "blue": "Blue briefing",
+        }
         ep.turns = [
+            Turn(
+                role="red",
+                observation="[0.2s] 80/tcp open http",
+                action="nmap -sV web",
+                reward=0.1,
+                assistant_content="<think>\nRecon first.\n</think>",
+            ),
+            Turn(
+                role="blue",
+                observation="[0.3s] suspicious nmap",
+                action="submit_finding nmap scan",
+                reward=0.2,
+                assistant_content="<think>\nThis is actionable.\n</think>",
+            ),
+            Turn(
+                role="red",
+                observation="[0.4s] products",
+                action="curl http://web/search?q=test",
+                reward=0.15,
+                assistant_content="<think>\nInspect the search route.\n</think>",
+            ),
+            Turn(
+                role="blue",
+                observation="[0.5s] SQLi in web log",
+                action="grep UNION /var/log/siem/web.log",
+                reward=0.05,
+                assistant_content="<think>\nI need evidence.\n</think>",
+            ),
+            Turn(
+                role="red",
+                observation="[0.6s] correct",
+                action="submit_flag FLAG{sqli_123}",
+                reward=0.5,
+                assistant_content="<think>\nThis token is worth validating.\n</think>",
+                tool_name="flag_found",
+                tool_arguments={"flag": "FLAG{sqli_123}"},
+            ),
         ]
         ep.outcome = "flag_captured"
         return ep
     def test_to_chat_messages_red(self):
         ep = self._make_episode()
         msgs = ep.to_chat_messages("red")
+        # system + briefing + 3 * (assistant + tool) = 8 messages
+        assert len(msgs) == 8
         assert msgs[0]["role"] == "system"
         assert msgs[0]["content"] == RED_SYSTEM_PROMPT
         assert msgs[1]["role"] == "user"
+        assert msgs[1]["content"] == "Red briefing"
         assert msgs[2]["role"] == "assistant"
+        assert "tool_calls" in msgs[2]
+        assert msgs[2]["tool_calls"][0]["function"]["name"] == "shell_command"
+        assert msgs[3]["role"] == "tool"
     def test_to_chat_messages_blue(self):
         ep = self._make_episode()
         msgs = ep.to_chat_messages("blue")
+        # system + briefing + 2 * (assistant + tool) = 6 messages
+        assert len(msgs) == 6
         assert msgs[0]["role"] == "system"
         assert msgs[0]["content"] == BLUE_SYSTEM_PROMPT
+        assert msgs[1]["role"] == "user"
+        assert msgs[1]["content"] == "Blue briefing"
     def test_to_jsonl_record(self):
         ep = self._make_episode()
 class TestTrajectoryLogger:
     def test_start_episode(self):
         logger = TrajectoryLogger()
+        ep = logger.start_episode(
+            "ep-1",
+            snapshot_id="snap-1",
+            tier=2,
+            briefings={"red": "brief"},
+        )
         assert ep.episode_id == "ep-1"
         assert ep.snapshot_id == "snap-1"
         assert ep.tier == 2
+        assert ep.briefings["red"] == "brief"
         assert logger.current_episode is ep
     def test_log_turn(self):
             # Messages must follow chat format
             msgs = record["messages"]
             assert msgs[0]["role"] == "system"
+            assert msgs[1]["role"] == "user"
             for msg in msgs:
                 assert "role" in msg
                 assert "content" in msg
+            for msg in msgs:
+                if msg["role"] == "assistant":
+                    assert msg["tool_calls"]
     def test_export_creates_parent_dirs(self, tmp_path: Path):
         logger = self._build_logger_with_episodes()