Lars Talian commited on
Commit
906af9d
·
1 Parent(s): 8e6f5b8

Add managed snapshot runtime behind OpenEnv server

Browse files
.gitignore CHANGED
@@ -55,3 +55,4 @@ htmlcov/
55
 
56
  # Pre-validated range pool (generated at startup)
57
  pool/
 
 
55
 
56
  # Pre-validated range pool (generated at startup)
57
  pool/
58
+ snapshots/
README.md CHANGED
@@ -12,14 +12,15 @@ A multi-agent cybersecurity gymnasium on [OpenEnv](https://github.com/meta-pytor
12
 
13
  ## How It Works
14
 
15
- A **manifest** declares a family of legal enterprise worlds — topology, services, identities, vulnerability classes, difficulty. A **Builder** LLM proposes a concrete snapshot within that family. A **Validator** pipeline admits only snapshots that are runnable, exploitable, patchable, and non-leaking. `reset()` selects a frozen validated snapshot. `step()` runs commands inside it.
16
 
17
  ```mermaid
18
  flowchart LR
19
- M[Manifest<br/>topology, services,<br/>bug families, difficulty] --> B[Builder<br/>LLM proposes<br/>snapshot]
20
- B --> V{Validator<br/>10 checks}
 
21
  V -->|fail| B
22
- V -->|pass| S[Frozen Snapshot]
23
  S --> E["reset() → step() → obs + reward"]
24
 
25
  style V fill:#ffd93d,color:#333
@@ -39,6 +40,10 @@ uv sync
39
  # Optional: enable the LiteLLM-backed builder pipeline
40
  uv sync --extra builder
41
 
 
 
 
 
42
  # End-to-end demo (no Docker, no LLM)
43
  uv run python examples/demo.py
44
 
@@ -61,13 +66,15 @@ uv run pytest tests/ -v --tb=short
61
 
62
  **Manifest** — YAML defining the legal world: hosts, zones, services, users, NPCs, data assets, credential policies, monitoring coverage, trust relationships, and which vulnerability classes the Builder may plant. Three example manifests ship (healthcare, fintech, SaaS) at tiers 1-3.
63
 
64
- **Builder** — Takes a manifest + curriculum context, outputs a `SnapshotSpec`: topology graph, truth graph (planted vulns + exploit chain), evidence graph (what Blue can find), flags, golden path, NPC traffic, and task briefings. Three implementations: `LLMSnapshotBuilder` (production, via litellm), `TemplateOnlyBuilder` (deterministic, for tests), `FileBuilder` (load from disk).
 
 
65
 
66
  The deployed package exposes the standard OpenEnv `reset()`, `step()`, and `state()` contract through `server.app:app`, which is the entrypoint referenced by `openenv.yaml`.
67
 
68
- **Validator** — 10-check admission pipeline. 8 mechanical checks (build/boot, exploitability, patchability, evidence sufficiency, reward grounding, isolation, task feasibility, difficulty calibration) + 2 LLM advisory checks (NPC consistency, realism review). Inverse mutation: patching each planted vuln must break its exploit step.
69
 
70
- **Environment** — `RangeEnvironment(Environment)` following the OpenEnv contract. `reset()` picks a frozen snapshot + samples a task. `step(action)` routes commands to the appropriate container — Red runs on the attacker box, Blue runs on the SIEM. No artificial command allowlists; the container's installed tools are the constraint.
71
 
72
  **Rewards** — All grounded in container state, not LLM judgment:
73
 
 
12
 
13
  ## How It Works
14
 
15
+ A **manifest** declares a family of legal enterprise worlds — topology, services, identities, vulnerability classes, difficulty. A shared **ManagedSnapshotRuntime** inside the shipped OpenEnv server process owns the snapshot pool. It loads admitted snapshots from disk or preloads a deterministic pool from the manifest, then optionally refills that pool in the background. `reset()` selects one frozen admitted snapshot. `step()` runs commands inside it.
16
 
17
  ```mermaid
18
  flowchart LR
19
+ M[Manifest<br/>topology, services,<br/>bug families, difficulty] --> R[ManagedSnapshotRuntime<br/>shared inside server process]
20
+ R --> B[Builder / mutator<br/>deterministic by default,<br/>LiteLLM optional]
21
+ B --> V{Validator}
22
  V -->|fail| B
23
+ V -->|pass| S[Frozen admitted snapshots]
24
  S --> E["reset() → step() → obs + reward"]
25
 
26
  style V fill:#ffd93d,color:#333
 
40
  # Optional: enable the LiteLLM-backed builder pipeline
41
  uv sync --extra builder
42
 
43
+ # Optional: enable background refill inside the server
44
+ export OPENRANGE_ENABLE_MANAGED_REFILL=1
45
+ export OPENRANGE_RUNTIME_BUILDER=llm
46
+
47
  # End-to-end demo (no Docker, no LLM)
48
  uv run python examples/demo.py
49
 
 
66
 
67
  **Manifest** — YAML defining the legal world: hosts, zones, services, users, NPCs, data assets, credential policies, monitoring coverage, trust relationships, and which vulnerability classes the Builder may plant. Three example manifests ship (healthcare, fintech, SaaS) at tiers 1-3.
68
 
69
+ **ManagedSnapshotRuntime** — Shared singleton created at server startup. Owns the `SnapshotStore`, builder/mutator, validator gate, snapshot preload, optional background refill, and episode-result feedback. This is the hidden orchestrator behind the env; callers still only see `reset()`, `step()`, and `state()`.
70
+
71
+ **Builder** — Takes a manifest + curriculum context, outputs a `SnapshotSpec`: topology graph, truth graph (planted vulns + exploit chain), evidence graph (what Blue can find), flags, golden path, NPC traffic, and task briefings. Three implementations: `LLMSnapshotBuilder` (production, via litellm), `TemplateOnlyBuilder` (deterministic shipped default), `FileBuilder` (load from disk).
72
 
73
  The deployed package exposes the standard OpenEnv `reset()`, `step()`, and `state()` contract through `server.app:app`, which is the entrypoint referenced by `openenv.yaml`.
74
 
75
+ **Validator** — Admission gate for candidate snapshots. The shipped runtime uses structural checks that operate on the compiled `SnapshotSpec` without requiring live model calls; richer container-backed checks remain available for private/local generation workflows.
76
 
77
+ **Environment** — `RangeEnvironment(Environment)` following the OpenEnv contract. `reset()` asks the shared runtime for a frozen admitted snapshot. `step(action)` routes commands to the appropriate container — Red runs on the attacker box, Blue runs on the SIEM. No artificial command allowlists; the container's installed tools are the constraint.
78
 
79
  **Rewards** — All grounded in container state, not LLM judgment:
80
 
docs/builder-validator.md CHANGED
@@ -4,7 +4,7 @@
4
 
5
  **LLM generates, renderer materializes, rules validate.** The builder uses LiteLLM to generate candidate snapshot specs as structured JSON. The renderer turns specs into Docker artifacts via Jinja2 templates. The validator runs a 10-check admission pipeline (8 mechanical + 2 LLM advisory) before admitting a snapshot.
6
 
7
- Snapshot creation happens **asynchronously between episodes**. `reset()` picks a pre-validated frozen snapshot from the `SnapshotStore`. No LLM calls in the hot path.
8
 
9
  ```mermaid
10
  flowchart LR
 
4
 
5
  **LLM generates, renderer materializes, rules validate.** The builder uses LiteLLM to generate candidate snapshot specs as structured JSON. The renderer turns specs into Docker artifacts via Jinja2 templates. The validator runs a 10-check admission pipeline (8 mechanical + 2 LLM advisory) before admitting a snapshot.
6
 
7
+ Snapshot creation happens **inside a shared `ManagedSnapshotRuntime` in the server process**. That runtime preloads admitted snapshots at startup and can optionally refill them between episodes. `reset()` picks a pre-validated frozen snapshot from the `SnapshotStore`. No LLM calls in the hot path.
8
 
9
  ```mermaid
10
  flowchart LR
openenv.yaml CHANGED
@@ -5,8 +5,4 @@ runtime: fastapi
5
  app: server.app:app
6
  port: 8000
7
  version: 0.1.0
8
- type: space
9
- runtime: fastapi
10
- app: open_range.server.app:app
11
- port: 8000
12
  description: "Multi-agent cybersecurity gymnasium built on OpenEnv"
 
5
  app: server.app:app
6
  port: 8000
7
  version: 0.1.0
 
 
 
 
8
  description: "Multi-agent cybersecurity gymnasium built on OpenEnv"
src/open_range/builder/snapshot_store.py CHANGED
@@ -11,6 +11,7 @@ import json
11
  import logging
12
  import random
13
  import time
 
14
  from pathlib import Path
15
  from typing import Any
16
 
@@ -19,6 +20,14 @@ from open_range.protocols import SnapshotSpec
19
  logger = logging.getLogger(__name__)
20
 
21
 
 
 
 
 
 
 
 
 
22
  class SnapshotStore:
23
  """Persist and retrieve validated snapshot specs."""
24
 
@@ -82,6 +91,10 @@ class SnapshotStore:
82
  Raises:
83
  FileNotFoundError: If the store is empty.
84
  """
 
 
 
 
85
  spec_files = sorted(self.store_dir.glob("*/spec.json"))
86
  if not spec_files:
87
  raise FileNotFoundError(
@@ -94,7 +107,10 @@ class SnapshotStore:
94
  chosen = max(spec_files, key=lambda p: p.stat().st_mtime)
95
 
96
  raw = json.loads(chosen.read_text(encoding="utf-8"))
97
- return SnapshotSpec.model_validate(raw)
 
 
 
98
 
99
  async def list_snapshots(self) -> list[dict[str, Any]]:
100
  """List all snapshots with their metadata.
@@ -124,3 +140,10 @@ class SnapshotStore:
124
  raise FileNotFoundError(f"Snapshot not found: {snapshot_id}")
125
  raw = json.loads(spec_path.read_text(encoding="utf-8"))
126
  return SnapshotSpec.model_validate(raw)
 
 
 
 
 
 
 
 
11
  import logging
12
  import random
13
  import time
14
+ from dataclasses import dataclass
15
  from pathlib import Path
16
  from typing import Any
17
 
 
20
  logger = logging.getLogger(__name__)
21
 
22
 
23
+ @dataclass(frozen=True, slots=True)
24
+ class StoredSnapshot:
25
+ """A frozen snapshot plus its persisted identifier."""
26
+
27
+ snapshot_id: str
28
+ snapshot: SnapshotSpec
29
+
30
+
31
  class SnapshotStore:
32
  """Persist and retrieve validated snapshot specs."""
33
 
 
91
  Raises:
92
  FileNotFoundError: If the store is empty.
93
  """
94
+ return (await self.select_entry(strategy=strategy)).snapshot
95
+
96
+ async def select_entry(self, strategy: str = "latest") -> StoredSnapshot:
97
+ """Select a snapshot plus its persisted ID."""
98
  spec_files = sorted(self.store_dir.glob("*/spec.json"))
99
  if not spec_files:
100
  raise FileNotFoundError(
 
107
  chosen = max(spec_files, key=lambda p: p.stat().st_mtime)
108
 
109
  raw = json.loads(chosen.read_text(encoding="utf-8"))
110
+ return StoredSnapshot(
111
+ snapshot_id=chosen.parent.name,
112
+ snapshot=SnapshotSpec.model_validate(raw),
113
+ )
114
 
115
  async def list_snapshots(self) -> list[dict[str, Any]]:
116
  """List all snapshots with their metadata.
 
140
  raise FileNotFoundError(f"Snapshot not found: {snapshot_id}")
141
  raw = json.loads(spec_path.read_text(encoding="utf-8"))
142
  return SnapshotSpec.model_validate(raw)
143
+
144
+ async def get_entry(self, snapshot_id: str) -> StoredSnapshot:
145
+ """Load a specific snapshot plus its ID."""
146
+ return StoredSnapshot(
147
+ snapshot_id=snapshot_id,
148
+ snapshot=await self.get(snapshot_id),
149
+ )
src/open_range/server/app.py CHANGED
@@ -8,16 +8,25 @@ from openenv.core.env_server import create_app as create_openenv_app
8
  from open_range.server.console import console_router
9
  from open_range.server.environment import RangeEnvironment
10
  from open_range.server.models import RangeAction, RangeObservation
 
11
 
12
 
13
  def create_app() -> FastAPI:
14
  """Create the OpenRange app using the standard OpenEnv contract."""
 
 
 
 
 
15
  app = create_openenv_app(
16
- RangeEnvironment,
17
  RangeAction,
18
  RangeObservation,
19
  env_name="open_range",
20
  )
 
 
 
21
  app.include_router(console_router)
22
  return app
23
 
 
8
  from open_range.server.console import console_router
9
  from open_range.server.environment import RangeEnvironment
10
  from open_range.server.models import RangeAction, RangeObservation
11
+ from open_range.server.runtime import ManagedSnapshotRuntime
12
 
13
 
14
  def create_app() -> FastAPI:
15
  """Create the OpenRange app using the standard OpenEnv contract."""
16
+ runtime = ManagedSnapshotRuntime.from_env()
17
+
18
+ def env_factory() -> RangeEnvironment:
19
+ return RangeEnvironment(runtime=runtime)
20
+
21
  app = create_openenv_app(
22
+ env_factory,
23
  RangeAction,
24
  RangeObservation,
25
  env_name="open_range",
26
  )
27
+ app.state.runtime = runtime
28
+ app.add_event_handler("startup", runtime.start)
29
+ app.add_event_handler("shutdown", runtime.stop)
30
  app.include_router(console_router)
31
  return app
32
 
src/open_range/server/environment.py CHANGED
@@ -17,13 +17,16 @@ from __future__ import annotations
17
 
18
  import logging
19
  import time
20
- from typing import Any
21
  from uuid import uuid4
22
 
23
  from open_range.protocols import SnapshotSpec, TaskSpec
24
 
25
  from open_range.server.models import RangeAction, RangeObservation, RangeState
26
 
 
 
 
27
  logger = logging.getLogger(__name__)
28
 
29
  # ---------------------------------------------------------------------------
@@ -87,6 +90,7 @@ class RangeEnvironment(_BASE): # type: ignore[misc]
87
 
88
  def __init__(
89
  self,
 
90
  max_steps: int = DEFAULT_MAX_STEPS,
91
  exec_timeout: float = EXEC_TIMEOUT,
92
  docker_available: bool | None = None,
@@ -95,6 +99,7 @@ class RangeEnvironment(_BASE): # type: ignore[misc]
95
  super().__init__()
96
  self._state = RangeState()
97
  self._snapshot: SnapshotSpec | None = None
 
98
  self._red_history: list[dict[str, Any]] = []
99
  self._blue_history: list[dict[str, Any]] = []
100
  self._npc_traffic_log: list[dict[str, Any]] = []
@@ -109,6 +114,8 @@ class RangeEnvironment(_BASE): # type: ignore[misc]
109
  # Docker client -- resolved lazily
110
  self._docker_client: Any = None
111
  self._docker_available = docker_available
 
 
112
 
113
  # -----------------------------------------------------------------
114
  # Docker helpers
@@ -181,10 +188,18 @@ class RangeEnvironment(_BASE): # type: ignore[misc]
181
  3. A minimal fallback (for testing without Docker)
182
  """
183
  if "snapshot" in kwargs and isinstance(kwargs["snapshot"], SnapshotSpec):
 
184
  return kwargs["snapshot"]
185
 
186
- # In production, a SnapshotStore would be consulted here.
187
- # For now, return a minimal placeholder.
 
 
 
 
 
 
 
188
  return SnapshotSpec(
189
  topology={"hosts": []},
190
  flags=[],
@@ -457,6 +472,8 @@ class RangeEnvironment(_BASE): # type: ignore[misc]
457
  Returns:
458
  Initial RangeObservation with the challenge briefing.
459
  """
 
 
460
  # Select snapshot
461
  self._snapshot = self._select_snapshot(**kwargs)
462
 
@@ -477,6 +494,7 @@ class RangeEnvironment(_BASE): # type: ignore[misc]
477
  self._blue_history = []
478
  self._npc_traffic_log = []
479
  self._episode_start = time.time()
 
480
 
481
  # Build initial briefing
482
  task = self._snapshot.task
@@ -540,30 +558,35 @@ class RangeEnvironment(_BASE): # type: ignore[misc]
540
  obs = self._handle_submit_flag(action)
541
  obs = self._apply_rewards(action, obs)
542
  self._check_termination(obs)
 
543
  return obs
544
 
545
  if cmd_name == "submit_evidence":
546
  obs = self._handle_submit_evidence(action)
547
  obs = self._apply_rewards(action, obs)
548
  self._check_termination(obs)
 
549
  return obs
550
 
551
  if cmd_name == "submit_finding":
552
  obs = self._handle_submit_finding(action)
553
  obs = self._apply_rewards(action, obs)
554
  self._check_termination(obs)
 
555
  return obs
556
 
557
  if cmd_name == "auth":
558
  obs = self._handle_auth(action)
559
  obs = self._apply_rewards(action, obs)
560
  self._check_termination(obs)
 
561
  return obs
562
 
563
  if cmd_name == "logout":
564
  obs = self._handle_logout(action)
565
  obs = self._apply_rewards(action, obs)
566
  self._check_termination(obs)
 
567
  return obs
568
 
569
  # Route to container
@@ -604,6 +627,7 @@ class RangeEnvironment(_BASE): # type: ignore[misc]
604
  # Compute rewards and check termination
605
  obs = self._apply_rewards(action, obs)
606
  self._check_termination(obs)
 
607
 
608
  return obs
609
 
@@ -678,6 +702,28 @@ class RangeEnvironment(_BASE): # type: ignore[misc]
678
  obs.done = True
679
  return
680
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
681
  # -----------------------------------------------------------------
682
  # Alert system
683
  # -----------------------------------------------------------------
@@ -724,6 +770,7 @@ class RangeEnvironment(_BASE): # type: ignore[misc]
724
 
725
  def close(self) -> None:
726
  """Release resources (Docker client, episode state)."""
 
727
  if self._docker_client is not None:
728
  try:
729
  self._docker_client.close()
 
17
 
18
  import logging
19
  import time
20
+ from typing import TYPE_CHECKING, Any
21
  from uuid import uuid4
22
 
23
  from open_range.protocols import SnapshotSpec, TaskSpec
24
 
25
  from open_range.server.models import RangeAction, RangeObservation, RangeState
26
 
27
+ if TYPE_CHECKING:
28
+ from open_range.server.runtime import ManagedSnapshotRuntime
29
+
30
  logger = logging.getLogger(__name__)
31
 
32
  # ---------------------------------------------------------------------------
 
90
 
91
  def __init__(
92
  self,
93
+ runtime: "ManagedSnapshotRuntime | None" = None,
94
  max_steps: int = DEFAULT_MAX_STEPS,
95
  exec_timeout: float = EXEC_TIMEOUT,
96
  docker_available: bool | None = None,
 
99
  super().__init__()
100
  self._state = RangeState()
101
  self._snapshot: SnapshotSpec | None = None
102
+ self._snapshot_id: str | None = None
103
  self._red_history: list[dict[str, Any]] = []
104
  self._blue_history: list[dict[str, Any]] = []
105
  self._npc_traffic_log: list[dict[str, Any]] = []
 
114
  # Docker client -- resolved lazily
115
  self._docker_client: Any = None
116
  self._docker_available = docker_available
117
+ self._runtime = runtime
118
+ self._episode_recorded = False
119
 
120
  # -----------------------------------------------------------------
121
  # Docker helpers
 
188
  3. A minimal fallback (for testing without Docker)
189
  """
190
  if "snapshot" in kwargs and isinstance(kwargs["snapshot"], SnapshotSpec):
191
+ self._snapshot_id = kwargs.get("snapshot_id")
192
  return kwargs["snapshot"]
193
 
194
+ if self._runtime is not None:
195
+ if "snapshot_id" in kwargs and kwargs["snapshot_id"]:
196
+ admitted = self._runtime.get_snapshot(str(kwargs["snapshot_id"]))
197
+ else:
198
+ admitted = self._runtime.acquire_snapshot()
199
+ self._snapshot_id = admitted.snapshot_id
200
+ return admitted.snapshot
201
+
202
+ self._snapshot_id = None
203
  return SnapshotSpec(
204
  topology={"hosts": []},
205
  flags=[],
 
472
  Returns:
473
  Initial RangeObservation with the challenge briefing.
474
  """
475
+ self._report_episode_result(completed=False)
476
+
477
  # Select snapshot
478
  self._snapshot = self._select_snapshot(**kwargs)
479
 
 
494
  self._blue_history = []
495
  self._npc_traffic_log = []
496
  self._episode_start = time.time()
497
+ self._episode_recorded = False
498
 
499
  # Build initial briefing
500
  task = self._snapshot.task
 
558
  obs = self._handle_submit_flag(action)
559
  obs = self._apply_rewards(action, obs)
560
  self._check_termination(obs)
561
+ self._report_if_done(obs)
562
  return obs
563
 
564
  if cmd_name == "submit_evidence":
565
  obs = self._handle_submit_evidence(action)
566
  obs = self._apply_rewards(action, obs)
567
  self._check_termination(obs)
568
+ self._report_if_done(obs)
569
  return obs
570
 
571
  if cmd_name == "submit_finding":
572
  obs = self._handle_submit_finding(action)
573
  obs = self._apply_rewards(action, obs)
574
  self._check_termination(obs)
575
+ self._report_if_done(obs)
576
  return obs
577
 
578
  if cmd_name == "auth":
579
  obs = self._handle_auth(action)
580
  obs = self._apply_rewards(action, obs)
581
  self._check_termination(obs)
582
+ self._report_if_done(obs)
583
  return obs
584
 
585
  if cmd_name == "logout":
586
  obs = self._handle_logout(action)
587
  obs = self._apply_rewards(action, obs)
588
  self._check_termination(obs)
589
+ self._report_if_done(obs)
590
  return obs
591
 
592
  # Route to container
 
627
  # Compute rewards and check termination
628
  obs = self._apply_rewards(action, obs)
629
  self._check_termination(obs)
630
+ self._report_if_done(obs)
631
 
632
  return obs
633
 
 
702
  obs.done = True
703
  return
704
 
705
+ def _report_if_done(self, obs: RangeObservation) -> None:
706
+ """Report a completed episode to the shared runtime once."""
707
+ if obs.done:
708
+ self._report_episode_result(completed=True)
709
+
710
+ def _report_episode_result(self, completed: bool) -> None:
711
+ """Record the current episode outcome with the shared runtime."""
712
+ if self._episode_recorded or self._runtime is None or self._snapshot is None:
713
+ return
714
+ if self._state.episode_id is None:
715
+ return
716
+
717
+ self._runtime.record_episode_result(
718
+ snapshot_id=self._snapshot_id,
719
+ snapshot=self._snapshot,
720
+ state=self._state,
721
+ red_history=self.red_history,
722
+ blue_history=self.blue_history,
723
+ completed=completed,
724
+ )
725
+ self._episode_recorded = True
726
+
727
  # -----------------------------------------------------------------
728
  # Alert system
729
  # -----------------------------------------------------------------
 
770
 
771
  def close(self) -> None:
772
  """Release resources (Docker client, episode state)."""
773
+ self._report_episode_result(completed=False)
774
  if self._docker_client is not None:
775
  try:
776
  self._docker_client.close()
src/open_range/server/runtime.py ADDED
@@ -0,0 +1,492 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Managed snapshot runtime for the shipped OpenRange server process.
2
+
3
+ This module keeps the OpenEnv-facing environment instances lightweight while a
4
+ single shared manager owns the admitted snapshot pool, generation loop, and
5
+ episode feedback.
6
+ """
7
+
8
+ from __future__ import annotations
9
+
10
+ import asyncio
11
+ import json
12
+ import logging
13
+ import os
14
+ import threading
15
+ import time
16
+ from dataclasses import dataclass, field
17
+ from pathlib import Path
18
+ from typing import Any
19
+
20
+ import yaml
21
+
22
+ from open_range.builder.builder import LLMSnapshotBuilder, TemplateOnlyBuilder
23
+ from open_range.builder.mutator import Mutator
24
+ from open_range.builder.snapshot_store import SnapshotStore
25
+ from open_range.protocols import (
26
+ BuildContext,
27
+ CheckResult,
28
+ ContainerSet,
29
+ SnapshotBuilder,
30
+ SnapshotSpec,
31
+ )
32
+ from open_range.server.models import RangeState
33
+ from open_range.validator.task_feasibility import TaskFeasibilityCheck
34
+ from open_range.validator.validator import ValidationResult, ValidatorGate
35
+
36
+ logger = logging.getLogger(__name__)
37
+
38
+ _DEFAULT_MANIFEST = ("manifests", "tier1_basic.yaml")
39
+
40
+
41
+ def _env_flag(name: str, default: bool = False) -> bool:
42
+ raw = os.getenv(name)
43
+ if raw is None:
44
+ return default
45
+ return raw.strip().lower() in {"1", "true", "yes", "on"}
46
+
47
+
48
+ def _env_int(name: str, default: int) -> int:
49
+ raw = os.getenv(name)
50
+ if raw is None or raw.strip() == "":
51
+ return default
52
+ return int(raw)
53
+
54
+
55
+ def _candidate_roots() -> list[Path]:
56
+ roots: list[Path] = []
57
+ cwd = Path.cwd()
58
+ roots.append(cwd)
59
+ file_path = Path(__file__).resolve()
60
+ roots.extend(file_path.parents[:6])
61
+
62
+ unique: list[Path] = []
63
+ seen: set[Path] = set()
64
+ for root in roots:
65
+ if root in seen:
66
+ continue
67
+ seen.add(root)
68
+ unique.append(root)
69
+ return unique
70
+
71
+
72
+ def _resolve_default_manifest_path() -> Path:
73
+ for root in _candidate_roots():
74
+ candidate = root.joinpath(*_DEFAULT_MANIFEST)
75
+ if candidate.exists():
76
+ return candidate
77
+ raise FileNotFoundError(
78
+ "Could not locate the default manifest. "
79
+ "Set OPENRANGE_RUNTIME_MANIFEST to an explicit YAML path."
80
+ )
81
+
82
+
83
+ def _resolve_store_dir(store_dir: str | Path | None) -> Path:
84
+ if store_dir is None:
85
+ return Path(os.getenv("OPENRANGE_SNAPSHOT_DIR", "snapshots")).resolve()
86
+ return Path(store_dir).resolve()
87
+
88
+
89
+ def _run_coro_sync(coro: Any) -> Any:
90
+ """Run an async coroutine from sync code.
91
+
92
+ The runtime is used from sync OpenEnv environment methods and a background
93
+ thread, so we provide a conservative bridge here.
94
+ """
95
+
96
+ try:
97
+ asyncio.get_running_loop()
98
+ except RuntimeError:
99
+ return asyncio.run(coro)
100
+
101
+ result: dict[str, Any] = {}
102
+ error: list[BaseException] = []
103
+
104
+ def _runner() -> None:
105
+ try:
106
+ result["value"] = asyncio.run(coro)
107
+ except BaseException as exc: # noqa: BLE001
108
+ error.append(exc)
109
+
110
+ thread = threading.Thread(target=_runner, name="openrange-coro-bridge")
111
+ thread.start()
112
+ thread.join()
113
+ if error:
114
+ raise error[0]
115
+ return result.get("value")
116
+
117
+
118
+ @dataclass(slots=True)
119
+ class EpisodeOutcome:
120
+ snapshot_id: str | None
121
+ red_solved: bool
122
+ blue_detected: bool
123
+ steps: int
124
+ weak_areas: list[str] = field(default_factory=list)
125
+ completed: bool = False
126
+ recorded_at: float = field(default_factory=time.time)
127
+
128
+
129
+ class CurriculumTracker:
130
+ """Tiny in-process curriculum memory for future snapshot generation."""
131
+
132
+ def __init__(self, max_history: int = 100) -> None:
133
+ self.max_history = max_history
134
+ self._history: list[EpisodeOutcome] = []
135
+ self._lock = threading.Lock()
136
+
137
+ def record(self, outcome: EpisodeOutcome) -> None:
138
+ with self._lock:
139
+ self._history.append(outcome)
140
+ if len(self._history) > self.max_history:
141
+ del self._history[: len(self._history) - self.max_history]
142
+
143
+ def build_context(self, *, seed: int, tier: int) -> BuildContext:
144
+ with self._lock:
145
+ history = list(self._history)
146
+
147
+ completed = [o for o in history if o.completed]
148
+ red_solve_rate = (
149
+ sum(1 for o in completed if o.red_solved) / len(completed)
150
+ if completed
151
+ else 0.0
152
+ )
153
+ blue_detect_rate = (
154
+ sum(1 for o in completed if o.blue_detected) / len(completed)
155
+ if completed
156
+ else 0.0
157
+ )
158
+
159
+ weak_counts: dict[str, int] = {}
160
+ for outcome in completed:
161
+ if outcome.red_solved:
162
+ continue
163
+ for area in outcome.weak_areas:
164
+ weak_counts[area] = weak_counts.get(area, 0) + 1
165
+
166
+ weak_areas = [
167
+ area
168
+ for area, _count in sorted(
169
+ weak_counts.items(),
170
+ key=lambda item: (-item[1], item[0]),
171
+ )[:3]
172
+ ]
173
+
174
+ return BuildContext(
175
+ seed=seed,
176
+ tier=tier,
177
+ red_solve_rate=red_solve_rate,
178
+ blue_detect_rate=blue_detect_rate,
179
+ weak_areas=weak_areas,
180
+ )
181
+
182
+ @property
183
+ def history(self) -> list[EpisodeOutcome]:
184
+ with self._lock:
185
+ return list(self._history)
186
+
187
+
188
+ @dataclass(frozen=True, slots=True)
189
+ class RuntimeSnapshot:
190
+ snapshot_id: str
191
+ snapshot: SnapshotSpec
192
+
193
+
194
+ class StructuralSnapshotCheck:
195
+ """Lightweight admission check for the shipped no-Docker runtime path."""
196
+
197
+ async def check(
198
+ self,
199
+ snapshot: SnapshotSpec,
200
+ containers: ContainerSet,
201
+ ) -> CheckResult:
202
+ issues: list[str] = []
203
+ if not snapshot.truth_graph.vulns:
204
+ issues.append("truth_graph has no vulns")
205
+ if not snapshot.golden_path:
206
+ issues.append("golden_path is empty")
207
+ if not snapshot.flags:
208
+ issues.append("flags are empty")
209
+ if not snapshot.task.red_briefing or not snapshot.task.blue_briefing:
210
+ issues.append("task briefings are missing")
211
+ for briefing_name, text in (
212
+ ("red_briefing", snapshot.task.red_briefing),
213
+ ("blue_briefing", snapshot.task.blue_briefing),
214
+ ):
215
+ for flag in snapshot.flags:
216
+ if flag.value and flag.value in text:
217
+ issues.append(f"flag leaked in {briefing_name}")
218
+ for step in snapshot.golden_path:
219
+ if len(step.command) > 20 and step.command in text:
220
+ issues.append(f"golden-path command leaked in {briefing_name}")
221
+
222
+ passed = len(issues) == 0
223
+ return CheckResult(
224
+ name="structural_snapshot",
225
+ passed=passed,
226
+ details={"issues": issues},
227
+ error="" if passed else "; ".join(issues),
228
+ )
229
+
230
+
231
+ def _default_builder() -> SnapshotBuilder:
232
+ mode = os.getenv("OPENRANGE_RUNTIME_BUILDER", "template").strip().lower()
233
+ if mode == "template":
234
+ return TemplateOnlyBuilder()
235
+ if mode == "llm":
236
+ return LLMSnapshotBuilder()
237
+ raise ValueError(
238
+ f"Unsupported OPENRANGE_RUNTIME_BUILDER={mode!r}. "
239
+ "Expected 'template' or 'llm'."
240
+ )
241
+
242
+
243
+ def _default_validator() -> ValidatorGate:
244
+ # These checks work directly against the compiled snapshot spec and do not
245
+ # require booted containers. They are the safe default for shipped mode.
246
+ return ValidatorGate(
247
+ [
248
+ StructuralSnapshotCheck(),
249
+ TaskFeasibilityCheck(),
250
+ ]
251
+ )
252
+
253
+
254
+ class ManagedSnapshotRuntime:
255
+ """Shared server-side manager for admitted snapshots."""
256
+
257
+ def __init__(
258
+ self,
259
+ *,
260
+ manifest: dict[str, Any] | None = None,
261
+ manifest_path: str | Path | None = None,
262
+ store_dir: str | Path | None = None,
263
+ builder: SnapshotBuilder | None = None,
264
+ validator: ValidatorGate | None = None,
265
+ pool_size: int = 3,
266
+ selection_strategy: str = "random",
267
+ refill_enabled: bool = False,
268
+ refill_interval_s: float = 2.0,
269
+ generation_retries: int = 3,
270
+ ) -> None:
271
+ self.manifest_path = (
272
+ Path(manifest_path).resolve()
273
+ if manifest_path is not None
274
+ else _resolve_default_manifest_path()
275
+ )
276
+ self.manifest = manifest or self._load_manifest(self.manifest_path)
277
+ self.store_dir = _resolve_store_dir(store_dir)
278
+ self.store = SnapshotStore(str(self.store_dir))
279
+ self.builder = builder or _default_builder()
280
+ self.mutator = Mutator(self.builder)
281
+ self.validator = validator or _default_validator()
282
+ self.curriculum = CurriculumTracker()
283
+ self.pool_size = max(1, pool_size)
284
+ self.selection_strategy = selection_strategy
285
+ self.refill_enabled = refill_enabled
286
+ self.refill_interval_s = max(0.25, refill_interval_s)
287
+ self.generation_retries = max(1, generation_retries)
288
+
289
+ self._lock = threading.RLock()
290
+ self._refill_thread: threading.Thread | None = None
291
+ self._stop_event = threading.Event()
292
+ self._started = False
293
+ self._generation_counter = 0
294
+
295
+ @classmethod
296
+ def from_env(cls) -> "ManagedSnapshotRuntime":
297
+ return cls(
298
+ manifest_path=os.getenv("OPENRANGE_RUNTIME_MANIFEST"),
299
+ store_dir=os.getenv("OPENRANGE_SNAPSHOT_DIR"),
300
+ pool_size=_env_int("OPENRANGE_SNAPSHOT_POOL_SIZE", 3),
301
+ selection_strategy=os.getenv("OPENRANGE_SNAPSHOT_SELECTION", "random"),
302
+ refill_enabled=_env_flag("OPENRANGE_ENABLE_MANAGED_REFILL", default=False),
303
+ refill_interval_s=float(os.getenv("OPENRANGE_REFILL_INTERVAL_S", "2.0")),
304
+ generation_retries=_env_int("OPENRANGE_GENERATION_RETRIES", 3),
305
+ )
306
+
307
+ @staticmethod
308
+ def _load_manifest(path: Path) -> dict[str, Any]:
309
+ with path.open("r", encoding="utf-8") as handle:
310
+ data = yaml.safe_load(handle) or {}
311
+ if not isinstance(data, dict):
312
+ raise TypeError(f"Manifest at {path} did not parse to a mapping")
313
+ return data
314
+
315
+ def start(self) -> None:
316
+ with self._lock:
317
+ if self._started:
318
+ return
319
+
320
+ existing = self.snapshot_count()
321
+ if existing < self.pool_size:
322
+ self._top_up_pool(self.pool_size - existing)
323
+
324
+ available = self.snapshot_count()
325
+ if available == 0:
326
+ raise RuntimeError(
327
+ "ManagedSnapshotRuntime could not load or generate any admitted snapshots"
328
+ )
329
+
330
+ if self.refill_enabled:
331
+ self._stop_event.clear()
332
+ self._refill_thread = threading.Thread(
333
+ target=self._refill_loop,
334
+ name="openrange-runtime-refill",
335
+ daemon=True,
336
+ )
337
+ self._refill_thread.start()
338
+
339
+ self._started = True
340
+ logger.info(
341
+ "ManagedSnapshotRuntime started with %d admitted snapshot(s) in %s",
342
+ available,
343
+ self.store_dir,
344
+ )
345
+
346
+ def stop(self) -> None:
347
+ with self._lock:
348
+ if not self._started:
349
+ return
350
+ self._stop_event.set()
351
+ thread = self._refill_thread
352
+ self._refill_thread = None
353
+ self._started = False
354
+
355
+ if thread is not None:
356
+ thread.join(timeout=self.refill_interval_s * 2)
357
+
358
+ def acquire_snapshot(self, *, snapshot_id: str | None = None) -> RuntimeSnapshot:
359
+ self.start()
360
+ if snapshot_id:
361
+ return self.get_snapshot(snapshot_id)
362
+
363
+ stored = _run_coro_sync(self.store.select_entry(strategy=self.selection_strategy))
364
+ return RuntimeSnapshot(snapshot_id=stored.snapshot_id, snapshot=stored.snapshot)
365
+
366
+ def get_snapshot(self, snapshot_id: str) -> RuntimeSnapshot:
367
+ self.start()
368
+ stored = _run_coro_sync(self.store.get_entry(snapshot_id))
369
+ return RuntimeSnapshot(snapshot_id=stored.snapshot_id, snapshot=stored.snapshot)
370
+
371
+ def list_snapshots(self) -> list[dict[str, Any]]:
372
+ return _run_coro_sync(self.store.list_snapshots())
373
+
374
+ def snapshot_count(self) -> int:
375
+ return len(self.list_snapshots())
376
+
377
+ def status(self) -> dict[str, Any]:
378
+ return {
379
+ "manifest_path": str(self.manifest_path),
380
+ "store_dir": str(self.store_dir),
381
+ "pool_size": self.pool_size,
382
+ "selection_strategy": self.selection_strategy,
383
+ "refill_enabled": self.refill_enabled,
384
+ "snapshot_count": self.snapshot_count(),
385
+ "started": self._started,
386
+ }
387
+
388
+ def record_episode_result(
389
+ self,
390
+ *,
391
+ snapshot_id: str | None,
392
+ snapshot: SnapshotSpec | None,
393
+ state: RangeState,
394
+ red_history: list[dict[str, Any]],
395
+ blue_history: list[dict[str, Any]],
396
+ completed: bool,
397
+ ) -> None:
398
+ if snapshot is None:
399
+ return
400
+
401
+ total_flags = len(snapshot.flags)
402
+ red_solved = total_flags > 0 and len(state.flags_found) >= total_flags
403
+ blue_detected = any(
404
+ record.get("type") == "finding" or record.get("cmd_name") == "submit_finding"
405
+ for record in blue_history
406
+ )
407
+ weak_areas = []
408
+ if not red_solved:
409
+ weak_areas = [v.type for v in snapshot.truth_graph.vulns]
410
+
411
+ self.curriculum.record(
412
+ EpisodeOutcome(
413
+ snapshot_id=snapshot_id,
414
+ red_solved=red_solved,
415
+ blue_detected=blue_detected,
416
+ steps=state.step_count,
417
+ weak_areas=weak_areas,
418
+ completed=completed,
419
+ )
420
+ )
421
+
422
+ def _refill_loop(self) -> None:
423
+ while not self._stop_event.wait(self.refill_interval_s):
424
+ try:
425
+ missing = self.pool_size - self.snapshot_count()
426
+ if missing > 0:
427
+ self._top_up_pool(missing)
428
+ except Exception as exc: # noqa: BLE001
429
+ logger.warning("ManagedSnapshotRuntime refill failed: %s", exc)
430
+
431
+ def _top_up_pool(self, missing: int) -> None:
432
+ for _ in range(max(0, missing)):
433
+ self._generate_and_store_snapshot()
434
+
435
+ def _generate_and_store_snapshot(self) -> str:
436
+ last_error: str | None = None
437
+ for attempt in range(1, self.generation_retries + 1):
438
+ context = self._build_context()
439
+ snapshot = _run_coro_sync(
440
+ self.mutator.mutate(
441
+ self.manifest,
442
+ context=context,
443
+ error={"message": last_error} if last_error else None,
444
+ )
445
+ )
446
+ validation = self._validate_snapshot(snapshot)
447
+ if validation.passed:
448
+ snapshot_id = _run_coro_sync(self.store.store(snapshot))
449
+ logger.info(
450
+ "ManagedSnapshotRuntime admitted snapshot %s on attempt %d",
451
+ snapshot_id,
452
+ attempt,
453
+ )
454
+ return snapshot_id
455
+
456
+ last_error = self._validation_error(validation)
457
+ logger.warning(
458
+ "ManagedSnapshotRuntime rejected candidate on attempt %d: %s",
459
+ attempt,
460
+ last_error,
461
+ )
462
+
463
+ raise RuntimeError(
464
+ "ManagedSnapshotRuntime failed to admit a snapshot after "
465
+ f"{self.generation_retries} attempt(s): {last_error}"
466
+ )
467
+
468
+ def _build_context(self) -> BuildContext:
469
+ seed = self._generation_counter
470
+ self._generation_counter += 1
471
+ tier = int(self.manifest.get("tier", 1) or 1)
472
+ context = self.curriculum.build_context(seed=seed, tier=tier)
473
+ context.episode_count = self.mutator.episode_count
474
+ return context
475
+
476
+ def _validate_snapshot(self, snapshot: SnapshotSpec) -> ValidationResult:
477
+ return _run_coro_sync(self.validator.validate(snapshot, ContainerSet()))
478
+
479
+ @staticmethod
480
+ def _validation_error(result: ValidationResult) -> str:
481
+ failed = [check for check in result.checks if not check.passed]
482
+ if not failed:
483
+ return "unknown validation failure"
484
+ payload = [
485
+ {
486
+ "name": check.name,
487
+ "error": check.error,
488
+ "details": check.details,
489
+ }
490
+ for check in failed
491
+ ]
492
+ return json.dumps(payload, sort_keys=True)
tests/test_runtime.py ADDED
@@ -0,0 +1,118 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Tests for the managed snapshot runtime."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import pytest
6
+
7
+ from open_range.server.environment import RangeEnvironment
8
+ from open_range.server.runtime import ManagedSnapshotRuntime
9
+
10
+
11
+ class TestManagedSnapshotRuntime:
12
+ def test_start_preloads_snapshot_pool(self, tier1_manifest, tmp_path):
13
+ runtime = ManagedSnapshotRuntime(
14
+ manifest=tier1_manifest,
15
+ store_dir=tmp_path / "snapshots",
16
+ pool_size=2,
17
+ refill_enabled=False,
18
+ )
19
+
20
+ runtime.start()
21
+ try:
22
+ listing = runtime.list_snapshots()
23
+ assert len(listing) == 2
24
+ assert all(item["snapshot_id"] for item in listing)
25
+ finally:
26
+ runtime.stop()
27
+
28
+ def test_acquire_snapshot_returns_admitted_snapshot(self, tier1_manifest, tmp_path):
29
+ runtime = ManagedSnapshotRuntime(
30
+ manifest=tier1_manifest,
31
+ store_dir=tmp_path / "snapshots",
32
+ pool_size=1,
33
+ selection_strategy="latest",
34
+ refill_enabled=False,
35
+ )
36
+
37
+ runtime.start()
38
+ try:
39
+ admitted = runtime.acquire_snapshot()
40
+ assert admitted.snapshot_id
41
+ assert admitted.snapshot.truth_graph.vulns
42
+ assert admitted.snapshot.flags
43
+ finally:
44
+ runtime.stop()
45
+
46
+ def test_get_snapshot_by_id_returns_exact_snapshot(self, tier1_manifest, tmp_path):
47
+ runtime = ManagedSnapshotRuntime(
48
+ manifest=tier1_manifest,
49
+ store_dir=tmp_path / "snapshots",
50
+ pool_size=1,
51
+ refill_enabled=False,
52
+ )
53
+
54
+ runtime.start()
55
+ try:
56
+ first = runtime.acquire_snapshot()
57
+ loaded = runtime.get_snapshot(first.snapshot_id)
58
+ assert loaded.snapshot_id == first.snapshot_id
59
+ assert loaded.snapshot.flags[0].value == first.snapshot.flags[0].value
60
+ finally:
61
+ runtime.stop()
62
+
63
+
64
+ class TestEnvironmentRuntimeIntegration:
65
+ def test_reset_uses_managed_runtime_snapshot(self, tier1_manifest, tmp_path):
66
+ runtime = ManagedSnapshotRuntime(
67
+ manifest=tier1_manifest,
68
+ store_dir=tmp_path / "snapshots",
69
+ pool_size=1,
70
+ refill_enabled=False,
71
+ )
72
+ runtime.start()
73
+
74
+ env = RangeEnvironment(runtime=runtime, docker_available=False)
75
+ try:
76
+ obs = env.reset()
77
+ assert "Range ready" in obs.stdout
78
+ assert env.snapshot is not None
79
+ assert env.snapshot.truth_graph.vulns
80
+ finally:
81
+ env.close()
82
+ runtime.stop()
83
+
84
+ def test_reset_snapshot_id_uses_runtime_store(self, tier1_manifest, tmp_path):
85
+ runtime = ManagedSnapshotRuntime(
86
+ manifest=tier1_manifest,
87
+ store_dir=tmp_path / "snapshots",
88
+ pool_size=1,
89
+ refill_enabled=False,
90
+ )
91
+ runtime.start()
92
+
93
+ env = RangeEnvironment(runtime=runtime, docker_available=False)
94
+ try:
95
+ admitted = runtime.acquire_snapshot()
96
+ env.reset(snapshot_id=admitted.snapshot_id)
97
+ assert env.snapshot is not None
98
+ assert env.snapshot.flags[0].value == admitted.snapshot.flags[0].value
99
+ finally:
100
+ env.close()
101
+ runtime.stop()
102
+
103
+ def test_missing_snapshot_id_raises(self, tier1_manifest, tmp_path):
104
+ runtime = ManagedSnapshotRuntime(
105
+ manifest=tier1_manifest,
106
+ store_dir=tmp_path / "snapshots",
107
+ pool_size=1,
108
+ refill_enabled=False,
109
+ )
110
+ runtime.start()
111
+
112
+ env = RangeEnvironment(runtime=runtime, docker_available=False)
113
+ try:
114
+ with pytest.raises(FileNotFoundError):
115
+ env.reset(snapshot_id="missing_snapshot")
116
+ finally:
117
+ env.close()
118
+ runtime.stop()