Spaces:

abrown31
/

open-range

Runtime error

App Files Files Community

Lars Talian commited on Mar 8

Commit

906af9d

1 Parent(s): 8e6f5b8

Add managed snapshot runtime behind OpenEnv server

Browse files

Files changed (9) hide show

.gitignore +1 -0
README.md +14 -7
docs/builder-validator.md +1 -1
openenv.yaml +0 -4
src/open_range/builder/snapshot_store.py +24 -1
src/open_range/server/app.py +10 -1
src/open_range/server/environment.py +50 -3
src/open_range/server/runtime.py +492 -0
tests/test_runtime.py +118 -0

.gitignore CHANGED Viewed

@@ -55,3 +55,4 @@ htmlcov/
 # Pre-validated range pool (generated at startup)
 pool/

 # Pre-validated range pool (generated at startup)
 pool/
+snapshots/

README.md CHANGED Viewed

@@ -12,14 +12,15 @@ A multi-agent cybersecurity gymnasium on [OpenEnv](https://github.com/meta-pytor
 ## How It Works
-A **manifest** declares a family of legal enterprise worlds — topology, services, identities, vulnerability classes, difficulty. A **Builder** LLM proposes a concrete snapshot within that family. A **Validator** pipeline admits only snapshots that are runnable, exploitable, patchable, and non-leaking. `reset()` selects a frozen validated snapshot. `step()` runs commands inside it.
 ```mermaid
 flowchart LR
-    M[Manifest<br/>topology, services,<br/>bug families, difficulty] --> B[Builder<br/>LLM proposes<br/>snapshot]
-    B --> V{Validator<br/>10 checks}
     V -->|fail| B
-    V -->|pass| S[Frozen Snapshot]
     S --> E["reset() → step() → obs + reward"]
     style V fill:#ffd93d,color:#333
@@ -39,6 +40,10 @@ uv sync
 # Optional: enable the LiteLLM-backed builder pipeline
 uv sync --extra builder
 # End-to-end demo (no Docker, no LLM)
 uv run python examples/demo.py
@@ -61,13 +66,15 @@ uv run pytest tests/ -v --tb=short
 **Manifest** — YAML defining the legal world: hosts, zones, services, users, NPCs, data assets, credential policies, monitoring coverage, trust relationships, and which vulnerability classes the Builder may plant. Three example manifests ship (healthcare, fintech, SaaS) at tiers 1-3.
-**Builder** — Takes a manifest + curriculum context, outputs a `SnapshotSpec`: topology graph, truth graph (planted vulns + exploit chain), evidence graph (what Blue can find), flags, golden path, NPC traffic, and task briefings. Three implementations: `LLMSnapshotBuilder` (production, via litellm), `TemplateOnlyBuilder` (deterministic, for tests), `FileBuilder` (load from disk).
 The deployed package exposes the standard OpenEnv `reset()`, `step()`, and `state()` contract through `server.app:app`, which is the entrypoint referenced by `openenv.yaml`.
-**Validator** — 10-check admission pipeline. 8 mechanical checks (build/boot, exploitability, patchability, evidence sufficiency, reward grounding, isolation, task feasibility, difficulty calibration) + 2 LLM advisory checks (NPC consistency, realism review). Inverse mutation: patching each planted vuln must break its exploit step.
-**Environment** — `RangeEnvironment(Environment)` following the OpenEnv contract. `reset()` picks a frozen snapshot + samples a task. `step(action)` routes commands to the appropriate container — Red runs on the attacker box, Blue runs on the SIEM. No artificial command allowlists; the container's installed tools are the constraint.
 **Rewards** — All grounded in container state, not LLM judgment:

 ## How It Works
+A **manifest** declares a family of legal enterprise worlds — topology, services, identities, vulnerability classes, difficulty. A shared **ManagedSnapshotRuntime** inside the shipped OpenEnv server process owns the snapshot pool. It loads admitted snapshots from disk or preloads a deterministic pool from the manifest, then optionally refills that pool in the background. `reset()` selects one frozen admitted snapshot. `step()` runs commands inside it.
 ```mermaid
 flowchart LR
+    M[Manifest<br/>topology, services,<br/>bug families, difficulty] --> R[ManagedSnapshotRuntime<br/>shared inside server process]
+    R --> B[Builder / mutator<br/>deterministic by default,<br/>LiteLLM optional]
+    B --> V{Validator}
     V -->|fail| B
+    V -->|pass| S[Frozen admitted snapshots]
     S --> E["reset() → step() → obs + reward"]
     style V fill:#ffd93d,color:#333
 # Optional: enable the LiteLLM-backed builder pipeline
 uv sync --extra builder
+# Optional: enable background refill inside the server
+export OPENRANGE_ENABLE_MANAGED_REFILL=1
+export OPENRANGE_RUNTIME_BUILDER=llm
 # End-to-end demo (no Docker, no LLM)
 uv run python examples/demo.py
 **Manifest** — YAML defining the legal world: hosts, zones, services, users, NPCs, data assets, credential policies, monitoring coverage, trust relationships, and which vulnerability classes the Builder may plant. Three example manifests ship (healthcare, fintech, SaaS) at tiers 1-3.
+**ManagedSnapshotRuntime** — Shared singleton created at server startup. Owns the `SnapshotStore`, builder/mutator, validator gate, snapshot preload, optional background refill, and episode-result feedback. This is the hidden orchestrator behind the env; callers still only see `reset()`, `step()`, and `state()`.
+**Builder** — Takes a manifest + curriculum context, outputs a `SnapshotSpec`: topology graph, truth graph (planted vulns + exploit chain), evidence graph (what Blue can find), flags, golden path, NPC traffic, and task briefings. Three implementations: `LLMSnapshotBuilder` (production, via litellm), `TemplateOnlyBuilder` (deterministic shipped default), `FileBuilder` (load from disk).
 The deployed package exposes the standard OpenEnv `reset()`, `step()`, and `state()` contract through `server.app:app`, which is the entrypoint referenced by `openenv.yaml`.
+**Validator** — Admission gate for candidate snapshots. The shipped runtime uses structural checks that operate on the compiled `SnapshotSpec` without requiring live model calls; richer container-backed checks remain available for private/local generation workflows.
+**Environment** — `RangeEnvironment(Environment)` following the OpenEnv contract. `reset()` asks the shared runtime for a frozen admitted snapshot. `step(action)` routes commands to the appropriate container — Red runs on the attacker box, Blue runs on the SIEM. No artificial command allowlists; the container's installed tools are the constraint.
 **Rewards** — All grounded in container state, not LLM judgment:

docs/builder-validator.md CHANGED Viewed

@@ -4,7 +4,7 @@
 **LLM generates, renderer materializes, rules validate.** The builder uses LiteLLM to generate candidate snapshot specs as structured JSON. The renderer turns specs into Docker artifacts via Jinja2 templates. The validator runs a 10-check admission pipeline (8 mechanical + 2 LLM advisory) before admitting a snapshot.
-Snapshot creation happens **asynchronously between episodes**. `reset()` picks a pre-validated frozen snapshot from the `SnapshotStore`. No LLM calls in the hot path.
 ```mermaid
 flowchart LR

 **LLM generates, renderer materializes, rules validate.** The builder uses LiteLLM to generate candidate snapshot specs as structured JSON. The renderer turns specs into Docker artifacts via Jinja2 templates. The validator runs a 10-check admission pipeline (8 mechanical + 2 LLM advisory) before admitting a snapshot.
+Snapshot creation happens **inside a shared `ManagedSnapshotRuntime` in the server process**. That runtime preloads admitted snapshots at startup and can optionally refill them between episodes. `reset()` picks a pre-validated frozen snapshot from the `SnapshotStore`. No LLM calls in the hot path.
 ```mermaid
 flowchart LR

openenv.yaml CHANGED Viewed

@@ -5,8 +5,4 @@ runtime: fastapi
 app: server.app:app
 port: 8000
 version: 0.1.0
-type: space
-runtime: fastapi
-app: open_range.server.app:app
-port: 8000
 description: "Multi-agent cybersecurity gymnasium built on OpenEnv"

 app: server.app:app
 port: 8000
 version: 0.1.0
 description: "Multi-agent cybersecurity gymnasium built on OpenEnv"

src/open_range/builder/snapshot_store.py CHANGED Viewed

@@ -11,6 +11,7 @@ import json
 import logging
 import random
 import time
 from pathlib import Path
 from typing import Any
@@ -19,6 +20,14 @@ from open_range.protocols import SnapshotSpec
 logger = logging.getLogger(__name__)
 class SnapshotStore:
     """Persist and retrieve validated snapshot specs."""
@@ -82,6 +91,10 @@ class SnapshotStore:
         Raises:
             FileNotFoundError: If the store is empty.
         """
         spec_files = sorted(self.store_dir.glob("*/spec.json"))
         if not spec_files:
             raise FileNotFoundError(
@@ -94,7 +107,10 @@ class SnapshotStore:
             chosen = max(spec_files, key=lambda p: p.stat().st_mtime)
         raw = json.loads(chosen.read_text(encoding="utf-8"))
-        return SnapshotSpec.model_validate(raw)
     async def list_snapshots(self) -> list[dict[str, Any]]:
         """List all snapshots with their metadata.
@@ -124,3 +140,10 @@ class SnapshotStore:
             raise FileNotFoundError(f"Snapshot not found: {snapshot_id}")
         raw = json.loads(spec_path.read_text(encoding="utf-8"))
         return SnapshotSpec.model_validate(raw)

 import logging
 import random
 import time
+from dataclasses import dataclass
 from pathlib import Path
 from typing import Any
 logger = logging.getLogger(__name__)
+@dataclass(frozen=True, slots=True)
+class StoredSnapshot:
+    """A frozen snapshot plus its persisted identifier."""
+    snapshot_id: str
+    snapshot: SnapshotSpec
 class SnapshotStore:
     """Persist and retrieve validated snapshot specs."""
         Raises:
             FileNotFoundError: If the store is empty.
         """
+        return (await self.select_entry(strategy=strategy)).snapshot
+    async def select_entry(self, strategy: str = "latest") -> StoredSnapshot:
+        """Select a snapshot plus its persisted ID."""
         spec_files = sorted(self.store_dir.glob("*/spec.json"))
         if not spec_files:
             raise FileNotFoundError(
             chosen = max(spec_files, key=lambda p: p.stat().st_mtime)
         raw = json.loads(chosen.read_text(encoding="utf-8"))
+        return StoredSnapshot(
+            snapshot_id=chosen.parent.name,
+            snapshot=SnapshotSpec.model_validate(raw),
+        )
     async def list_snapshots(self) -> list[dict[str, Any]]:
         """List all snapshots with their metadata.
             raise FileNotFoundError(f"Snapshot not found: {snapshot_id}")
         raw = json.loads(spec_path.read_text(encoding="utf-8"))
         return SnapshotSpec.model_validate(raw)
+    async def get_entry(self, snapshot_id: str) -> StoredSnapshot:
+        """Load a specific snapshot plus its ID."""
+        return StoredSnapshot(
+            snapshot_id=snapshot_id,
+            snapshot=await self.get(snapshot_id),
+        )

src/open_range/server/app.py CHANGED Viewed

@@ -8,16 +8,25 @@ from openenv.core.env_server import create_app as create_openenv_app
 from open_range.server.console import console_router
 from open_range.server.environment import RangeEnvironment
 from open_range.server.models import RangeAction, RangeObservation
 def create_app() -> FastAPI:
     """Create the OpenRange app using the standard OpenEnv contract."""
     app = create_openenv_app(
-        RangeEnvironment,
         RangeAction,
         RangeObservation,
         env_name="open_range",
     )
     app.include_router(console_router)
     return app

 from open_range.server.console import console_router
 from open_range.server.environment import RangeEnvironment
 from open_range.server.models import RangeAction, RangeObservation
+from open_range.server.runtime import ManagedSnapshotRuntime
 def create_app() -> FastAPI:
     """Create the OpenRange app using the standard OpenEnv contract."""
+    runtime = ManagedSnapshotRuntime.from_env()
+    def env_factory() -> RangeEnvironment:
+        return RangeEnvironment(runtime=runtime)
     app = create_openenv_app(
+        env_factory,
         RangeAction,
         RangeObservation,
         env_name="open_range",
     )
+    app.state.runtime = runtime
+    app.add_event_handler("startup", runtime.start)
+    app.add_event_handler("shutdown", runtime.stop)
     app.include_router(console_router)
     return app

src/open_range/server/environment.py CHANGED Viewed

@@ -17,13 +17,16 @@ from __future__ import annotations
 import logging
 import time
-from typing import Any
 from uuid import uuid4
 from open_range.protocols import SnapshotSpec, TaskSpec
 from open_range.server.models import RangeAction, RangeObservation, RangeState
 logger = logging.getLogger(__name__)
 # ---------------------------------------------------------------------------
@@ -87,6 +90,7 @@ class RangeEnvironment(_BASE):  # type: ignore[misc]
     def __init__(
         self,
         max_steps: int = DEFAULT_MAX_STEPS,
         exec_timeout: float = EXEC_TIMEOUT,
         docker_available: bool | None = None,
@@ -95,6 +99,7 @@ class RangeEnvironment(_BASE):  # type: ignore[misc]
             super().__init__()
         self._state = RangeState()
         self._snapshot: SnapshotSpec | None = None
         self._red_history: list[dict[str, Any]] = []
         self._blue_history: list[dict[str, Any]] = []
         self._npc_traffic_log: list[dict[str, Any]] = []
@@ -109,6 +114,8 @@ class RangeEnvironment(_BASE):  # type: ignore[misc]
         # Docker client -- resolved lazily
         self._docker_client: Any = None
         self._docker_available = docker_available
     # -----------------------------------------------------------------
     # Docker helpers
@@ -181,10 +188,18 @@ class RangeEnvironment(_BASE):  # type: ignore[misc]
         3. A minimal fallback (for testing without Docker)
         """
         if "snapshot" in kwargs and isinstance(kwargs["snapshot"], SnapshotSpec):
             return kwargs["snapshot"]
-        # In production, a SnapshotStore would be consulted here.
-        # For now, return a minimal placeholder.
         return SnapshotSpec(
             topology={"hosts": []},
             flags=[],
@@ -457,6 +472,8 @@ class RangeEnvironment(_BASE):  # type: ignore[misc]
         Returns:
             Initial RangeObservation with the challenge briefing.
         """
         # Select snapshot
         self._snapshot = self._select_snapshot(**kwargs)
@@ -477,6 +494,7 @@ class RangeEnvironment(_BASE):  # type: ignore[misc]
         self._blue_history = []
         self._npc_traffic_log = []
         self._episode_start = time.time()
         # Build initial briefing
         task = self._snapshot.task
@@ -540,30 +558,35 @@ class RangeEnvironment(_BASE):  # type: ignore[misc]
             obs = self._handle_submit_flag(action)
             obs = self._apply_rewards(action, obs)
             self._check_termination(obs)
             return obs
         if cmd_name == "submit_evidence":
             obs = self._handle_submit_evidence(action)
             obs = self._apply_rewards(action, obs)
             self._check_termination(obs)
             return obs
         if cmd_name == "submit_finding":
             obs = self._handle_submit_finding(action)
             obs = self._apply_rewards(action, obs)
             self._check_termination(obs)
             return obs
         if cmd_name == "auth":
             obs = self._handle_auth(action)
             obs = self._apply_rewards(action, obs)
             self._check_termination(obs)
             return obs
         if cmd_name == "logout":
             obs = self._handle_logout(action)
             obs = self._apply_rewards(action, obs)
             self._check_termination(obs)
             return obs
         # Route to container
@@ -604,6 +627,7 @@ class RangeEnvironment(_BASE):  # type: ignore[misc]
         # Compute rewards and check termination
         obs = self._apply_rewards(action, obs)
         self._check_termination(obs)
         return obs
@@ -678,6 +702,28 @@ class RangeEnvironment(_BASE):  # type: ignore[misc]
                 obs.done = True
                 return
     # -----------------------------------------------------------------
     # Alert system
     # -----------------------------------------------------------------
@@ -724,6 +770,7 @@ class RangeEnvironment(_BASE):  # type: ignore[misc]
     def close(self) -> None:
         """Release resources (Docker client, episode state)."""
         if self._docker_client is not None:
             try:
                 self._docker_client.close()

 import logging
 import time
+from typing import TYPE_CHECKING, Any
 from uuid import uuid4
 from open_range.protocols import SnapshotSpec, TaskSpec
 from open_range.server.models import RangeAction, RangeObservation, RangeState
+if TYPE_CHECKING:
+    from open_range.server.runtime import ManagedSnapshotRuntime
 logger = logging.getLogger(__name__)
 # ---------------------------------------------------------------------------
     def __init__(
         self,
+        runtime: "ManagedSnapshotRuntime | None" = None,
         max_steps: int = DEFAULT_MAX_STEPS,
         exec_timeout: float = EXEC_TIMEOUT,
         docker_available: bool | None = None,
             super().__init__()
         self._state = RangeState()
         self._snapshot: SnapshotSpec | None = None
+        self._snapshot_id: str | None = None
         self._red_history: list[dict[str, Any]] = []
         self._blue_history: list[dict[str, Any]] = []
         self._npc_traffic_log: list[dict[str, Any]] = []
         # Docker client -- resolved lazily
         self._docker_client: Any = None
         self._docker_available = docker_available
+        self._runtime = runtime
+        self._episode_recorded = False
     # -----------------------------------------------------------------
     # Docker helpers
         3. A minimal fallback (for testing without Docker)
         """
         if "snapshot" in kwargs and isinstance(kwargs["snapshot"], SnapshotSpec):
+            self._snapshot_id = kwargs.get("snapshot_id")
             return kwargs["snapshot"]
+        if self._runtime is not None:
+            if "snapshot_id" in kwargs and kwargs["snapshot_id"]:
+                admitted = self._runtime.get_snapshot(str(kwargs["snapshot_id"]))
+            else:
+                admitted = self._runtime.acquire_snapshot()
+            self._snapshot_id = admitted.snapshot_id
+            return admitted.snapshot
+        self._snapshot_id = None
         return SnapshotSpec(
             topology={"hosts": []},
             flags=[],
         Returns:
             Initial RangeObservation with the challenge briefing.
         """
+        self._report_episode_result(completed=False)
         # Select snapshot
         self._snapshot = self._select_snapshot(**kwargs)
         self._blue_history = []
         self._npc_traffic_log = []
         self._episode_start = time.time()
+        self._episode_recorded = False
         # Build initial briefing
         task = self._snapshot.task
             obs = self._handle_submit_flag(action)
             obs = self._apply_rewards(action, obs)
             self._check_termination(obs)
+            self._report_if_done(obs)
             return obs
         if cmd_name == "submit_evidence":
             obs = self._handle_submit_evidence(action)
             obs = self._apply_rewards(action, obs)
             self._check_termination(obs)
+            self._report_if_done(obs)
             return obs
         if cmd_name == "submit_finding":
             obs = self._handle_submit_finding(action)
             obs = self._apply_rewards(action, obs)
             self._check_termination(obs)
+            self._report_if_done(obs)
             return obs
         if cmd_name == "auth":
             obs = self._handle_auth(action)
             obs = self._apply_rewards(action, obs)
             self._check_termination(obs)
+            self._report_if_done(obs)
             return obs
         if cmd_name == "logout":
             obs = self._handle_logout(action)
             obs = self._apply_rewards(action, obs)
             self._check_termination(obs)
+            self._report_if_done(obs)
             return obs
         # Route to container
         # Compute rewards and check termination
         obs = self._apply_rewards(action, obs)
         self._check_termination(obs)
+        self._report_if_done(obs)
         return obs
                 obs.done = True
                 return
+    def _report_if_done(self, obs: RangeObservation) -> None:
+        """Report a completed episode to the shared runtime once."""
+        if obs.done:
+            self._report_episode_result(completed=True)
+    def _report_episode_result(self, completed: bool) -> None:
+        """Record the current episode outcome with the shared runtime."""
+        if self._episode_recorded or self._runtime is None or self._snapshot is None:
+            return
+        if self._state.episode_id is None:
+            return
+        self._runtime.record_episode_result(
+            snapshot_id=self._snapshot_id,
+            snapshot=self._snapshot,
+            state=self._state,
+            red_history=self.red_history,
+            blue_history=self.blue_history,
+            completed=completed,
+        )
+        self._episode_recorded = True
     # -----------------------------------------------------------------
     # Alert system
     # -----------------------------------------------------------------
     def close(self) -> None:
         """Release resources (Docker client, episode state)."""
+        self._report_episode_result(completed=False)
         if self._docker_client is not None:
             try:
                 self._docker_client.close()

src/open_range/server/runtime.py ADDED Viewed

	@@ -0,0 +1,492 @@

+"""Managed snapshot runtime for the shipped OpenRange server process.
+This module keeps the OpenEnv-facing environment instances lightweight while a
+single shared manager owns the admitted snapshot pool, generation loop, and
+episode feedback.
+"""
+from __future__ import annotations
+import asyncio
+import json
+import logging
+import os
+import threading
+import time
+from dataclasses import dataclass, field
+from pathlib import Path
+from typing import Any
+import yaml
+from open_range.builder.builder import LLMSnapshotBuilder, TemplateOnlyBuilder
+from open_range.builder.mutator import Mutator
+from open_range.builder.snapshot_store import SnapshotStore
+from open_range.protocols import (
+    BuildContext,
+    CheckResult,
+    ContainerSet,
+    SnapshotBuilder,
+    SnapshotSpec,
+)
+from open_range.server.models import RangeState
+from open_range.validator.task_feasibility import TaskFeasibilityCheck
+from open_range.validator.validator import ValidationResult, ValidatorGate
+logger = logging.getLogger(__name__)
+_DEFAULT_MANIFEST = ("manifests", "tier1_basic.yaml")
+def _env_flag(name: str, default: bool = False) -> bool:
+    raw = os.getenv(name)
+    if raw is None:
+        return default
+    return raw.strip().lower() in {"1", "true", "yes", "on"}
+def _env_int(name: str, default: int) -> int:
+    raw = os.getenv(name)
+    if raw is None or raw.strip() == "":
+        return default
+    return int(raw)
+def _candidate_roots() -> list[Path]:
+    roots: list[Path] = []
+    cwd = Path.cwd()
+    roots.append(cwd)
+    file_path = Path(__file__).resolve()
+    roots.extend(file_path.parents[:6])
+    unique: list[Path] = []
+    seen: set[Path] = set()
+    for root in roots:
+        if root in seen:
+            continue
+        seen.add(root)
+        unique.append(root)
+    return unique
+def _resolve_default_manifest_path() -> Path:
+    for root in _candidate_roots():
+        candidate = root.joinpath(*_DEFAULT_MANIFEST)
+        if candidate.exists():
+            return candidate
+    raise FileNotFoundError(
+        "Could not locate the default manifest. "
+        "Set OPENRANGE_RUNTIME_MANIFEST to an explicit YAML path."
+    )
+def _resolve_store_dir(store_dir: str | Path | None) -> Path:
+    if store_dir is None:
+        return Path(os.getenv("OPENRANGE_SNAPSHOT_DIR", "snapshots")).resolve()
+    return Path(store_dir).resolve()
+def _run_coro_sync(coro: Any) -> Any:
+    """Run an async coroutine from sync code.
+    The runtime is used from sync OpenEnv environment methods and a background
+    thread, so we provide a conservative bridge here.
+    """
+    try:
+        asyncio.get_running_loop()
+    except RuntimeError:
+        return asyncio.run(coro)
+    result: dict[str, Any] = {}
+    error: list[BaseException] = []
+    def _runner() -> None:
+        try:
+            result["value"] = asyncio.run(coro)
+        except BaseException as exc:  # noqa: BLE001
+            error.append(exc)
+    thread = threading.Thread(target=_runner, name="openrange-coro-bridge")
+    thread.start()
+    thread.join()
+    if error:
+        raise error[0]
+    return result.get("value")
+@dataclass(slots=True)
+class EpisodeOutcome:
+    snapshot_id: str | None
+    red_solved: bool
+    blue_detected: bool
+    steps: int
+    weak_areas: list[str] = field(default_factory=list)
+    completed: bool = False
+    recorded_at: float = field(default_factory=time.time)
+class CurriculumTracker:
+    """Tiny in-process curriculum memory for future snapshot generation."""
+    def __init__(self, max_history: int = 100) -> None:
+        self.max_history = max_history
+        self._history: list[EpisodeOutcome] = []
+        self._lock = threading.Lock()
+    def record(self, outcome: EpisodeOutcome) -> None:
+        with self._lock:
+            self._history.append(outcome)
+            if len(self._history) > self.max_history:
+                del self._history[: len(self._history) - self.max_history]
+    def build_context(self, *, seed: int, tier: int) -> BuildContext:
+        with self._lock:
+            history = list(self._history)
+        completed = [o for o in history if o.completed]
+        red_solve_rate = (
+            sum(1 for o in completed if o.red_solved) / len(completed)
+            if completed
+            else 0.0
+        )
+        blue_detect_rate = (
+            sum(1 for o in completed if o.blue_detected) / len(completed)
+            if completed
+            else 0.0
+        )
+        weak_counts: dict[str, int] = {}
+        for outcome in completed:
+            if outcome.red_solved:
+                continue
+            for area in outcome.weak_areas:
+                weak_counts[area] = weak_counts.get(area, 0) + 1
+        weak_areas = [
+            area
+            for area, _count in sorted(
+                weak_counts.items(),
+                key=lambda item: (-item[1], item[0]),
+            )[:3]
+        ]
+        return BuildContext(
+            seed=seed,
+            tier=tier,
+            red_solve_rate=red_solve_rate,
+            blue_detect_rate=blue_detect_rate,
+            weak_areas=weak_areas,
+        )
+    @property
+    def history(self) -> list[EpisodeOutcome]:
+        with self._lock:
+            return list(self._history)
+@dataclass(frozen=True, slots=True)
+class RuntimeSnapshot:
+    snapshot_id: str
+    snapshot: SnapshotSpec
+class StructuralSnapshotCheck:
+    """Lightweight admission check for the shipped no-Docker runtime path."""
+    async def check(
+        self,
+        snapshot: SnapshotSpec,
+        containers: ContainerSet,
+    ) -> CheckResult:
+        issues: list[str] = []
+        if not snapshot.truth_graph.vulns:
+            issues.append("truth_graph has no vulns")
+        if not snapshot.golden_path:
+            issues.append("golden_path is empty")
+        if not snapshot.flags:
+            issues.append("flags are empty")
+        if not snapshot.task.red_briefing or not snapshot.task.blue_briefing:
+            issues.append("task briefings are missing")
+        for briefing_name, text in (
+            ("red_briefing", snapshot.task.red_briefing),
+            ("blue_briefing", snapshot.task.blue_briefing),
+        ):
+            for flag in snapshot.flags:
+                if flag.value and flag.value in text:
+                    issues.append(f"flag leaked in {briefing_name}")
+            for step in snapshot.golden_path:
+                if len(step.command) > 20 and step.command in text:
+                    issues.append(f"golden-path command leaked in {briefing_name}")
+        passed = len(issues) == 0
+        return CheckResult(
+            name="structural_snapshot",
+            passed=passed,
+            details={"issues": issues},
+            error="" if passed else "; ".join(issues),
+        )
+def _default_builder() -> SnapshotBuilder:
+    mode = os.getenv("OPENRANGE_RUNTIME_BUILDER", "template").strip().lower()
+    if mode == "template":
+        return TemplateOnlyBuilder()
+    if mode == "llm":
+        return LLMSnapshotBuilder()
+    raise ValueError(
+        f"Unsupported OPENRANGE_RUNTIME_BUILDER={mode!r}. "
+        "Expected 'template' or 'llm'."
+    )
+def _default_validator() -> ValidatorGate:
+    # These checks work directly against the compiled snapshot spec and do not
+    # require booted containers. They are the safe default for shipped mode.
+    return ValidatorGate(
+        [
+            StructuralSnapshotCheck(),
+            TaskFeasibilityCheck(),
+        ]
+    )
+class ManagedSnapshotRuntime:
+    """Shared server-side manager for admitted snapshots."""
+    def __init__(
+        self,
+        *,
+        manifest: dict[str, Any] | None = None,
+        manifest_path: str | Path | None = None,
+        store_dir: str | Path | None = None,
+        builder: SnapshotBuilder | None = None,
+        validator: ValidatorGate | None = None,
+        pool_size: int = 3,
+        selection_strategy: str = "random",
+        refill_enabled: bool = False,
+        refill_interval_s: float = 2.0,
+        generation_retries: int = 3,
+    ) -> None:
+        self.manifest_path = (
+            Path(manifest_path).resolve()
+            if manifest_path is not None
+            else _resolve_default_manifest_path()
+        )
+        self.manifest = manifest or self._load_manifest(self.manifest_path)
+        self.store_dir = _resolve_store_dir(store_dir)
+        self.store = SnapshotStore(str(self.store_dir))
+        self.builder = builder or _default_builder()
+        self.mutator = Mutator(self.builder)
+        self.validator = validator or _default_validator()
+        self.curriculum = CurriculumTracker()
+        self.pool_size = max(1, pool_size)
+        self.selection_strategy = selection_strategy
+        self.refill_enabled = refill_enabled
+        self.refill_interval_s = max(0.25, refill_interval_s)
+        self.generation_retries = max(1, generation_retries)
+        self._lock = threading.RLock()
+        self._refill_thread: threading.Thread | None = None
+        self._stop_event = threading.Event()
+        self._started = False
+        self._generation_counter = 0
+    @classmethod
+    def from_env(cls) -> "ManagedSnapshotRuntime":
+        return cls(
+            manifest_path=os.getenv("OPENRANGE_RUNTIME_MANIFEST"),
+            store_dir=os.getenv("OPENRANGE_SNAPSHOT_DIR"),
+            pool_size=_env_int("OPENRANGE_SNAPSHOT_POOL_SIZE", 3),
+            selection_strategy=os.getenv("OPENRANGE_SNAPSHOT_SELECTION", "random"),
+            refill_enabled=_env_flag("OPENRANGE_ENABLE_MANAGED_REFILL", default=False),
+            refill_interval_s=float(os.getenv("OPENRANGE_REFILL_INTERVAL_S", "2.0")),
+            generation_retries=_env_int("OPENRANGE_GENERATION_RETRIES", 3),
+        )
+    @staticmethod
+    def _load_manifest(path: Path) -> dict[str, Any]:
+        with path.open("r", encoding="utf-8") as handle:
+            data = yaml.safe_load(handle) or {}
+        if not isinstance(data, dict):
+            raise TypeError(f"Manifest at {path} did not parse to a mapping")
+        return data
+    def start(self) -> None:
+        with self._lock:
+            if self._started:
+                return
+            existing = self.snapshot_count()
+            if existing < self.pool_size:
+                self._top_up_pool(self.pool_size - existing)
+            available = self.snapshot_count()
+            if available == 0:
+                raise RuntimeError(
+                    "ManagedSnapshotRuntime could not load or generate any admitted snapshots"
+                )
+            if self.refill_enabled:
+                self._stop_event.clear()
+                self._refill_thread = threading.Thread(
+                    target=self._refill_loop,
+                    name="openrange-runtime-refill",
+                    daemon=True,
+                )
+                self._refill_thread.start()
+            self._started = True
+            logger.info(
+                "ManagedSnapshotRuntime started with %d admitted snapshot(s) in %s",
+                available,
+                self.store_dir,
+            )
+    def stop(self) -> None:
+        with self._lock:
+            if not self._started:
+                return
+            self._stop_event.set()
+            thread = self._refill_thread
+            self._refill_thread = None
+            self._started = False
+        if thread is not None:
+            thread.join(timeout=self.refill_interval_s * 2)
+    def acquire_snapshot(self, *, snapshot_id: str | None = None) -> RuntimeSnapshot:
+        self.start()
+        if snapshot_id:
+            return self.get_snapshot(snapshot_id)
+        stored = _run_coro_sync(self.store.select_entry(strategy=self.selection_strategy))
+        return RuntimeSnapshot(snapshot_id=stored.snapshot_id, snapshot=stored.snapshot)
+    def get_snapshot(self, snapshot_id: str) -> RuntimeSnapshot:
+        self.start()
+        stored = _run_coro_sync(self.store.get_entry(snapshot_id))
+        return RuntimeSnapshot(snapshot_id=stored.snapshot_id, snapshot=stored.snapshot)
+    def list_snapshots(self) -> list[dict[str, Any]]:
+        return _run_coro_sync(self.store.list_snapshots())
+    def snapshot_count(self) -> int:
+        return len(self.list_snapshots())
+    def status(self) -> dict[str, Any]:
+        return {
+            "manifest_path": str(self.manifest_path),
+            "store_dir": str(self.store_dir),
+            "pool_size": self.pool_size,
+            "selection_strategy": self.selection_strategy,
+            "refill_enabled": self.refill_enabled,
+            "snapshot_count": self.snapshot_count(),
+            "started": self._started,
+        }
+    def record_episode_result(
+        self,
+        *,
+        snapshot_id: str | None,
+        snapshot: SnapshotSpec | None,
+        state: RangeState,
+        red_history: list[dict[str, Any]],
+        blue_history: list[dict[str, Any]],
+        completed: bool,
+    ) -> None:
+        if snapshot is None:
+            return
+        total_flags = len(snapshot.flags)
+        red_solved = total_flags > 0 and len(state.flags_found) >= total_flags
+        blue_detected = any(
+            record.get("type") == "finding" or record.get("cmd_name") == "submit_finding"
+            for record in blue_history
+        )
+        weak_areas = []
+        if not red_solved:
+            weak_areas = [v.type for v in snapshot.truth_graph.vulns]
+        self.curriculum.record(
+            EpisodeOutcome(
+                snapshot_id=snapshot_id,
+                red_solved=red_solved,
+                blue_detected=blue_detected,
+                steps=state.step_count,
+                weak_areas=weak_areas,
+                completed=completed,
+            )
+        )
+    def _refill_loop(self) -> None:
+        while not self._stop_event.wait(self.refill_interval_s):
+            try:
+                missing = self.pool_size - self.snapshot_count()
+                if missing > 0:
+                    self._top_up_pool(missing)
+            except Exception as exc:  # noqa: BLE001
+                logger.warning("ManagedSnapshotRuntime refill failed: %s", exc)
+    def _top_up_pool(self, missing: int) -> None:
+        for _ in range(max(0, missing)):
+            self._generate_and_store_snapshot()
+    def _generate_and_store_snapshot(self) -> str:
+        last_error: str | None = None
+        for attempt in range(1, self.generation_retries + 1):
+            context = self._build_context()
+            snapshot = _run_coro_sync(
+                self.mutator.mutate(
+                    self.manifest,
+                    context=context,
+                    error={"message": last_error} if last_error else None,
+                )
+            )
+            validation = self._validate_snapshot(snapshot)
+            if validation.passed:
+                snapshot_id = _run_coro_sync(self.store.store(snapshot))
+                logger.info(
+                    "ManagedSnapshotRuntime admitted snapshot %s on attempt %d",
+                    snapshot_id,
+                    attempt,
+                )
+                return snapshot_id
+            last_error = self._validation_error(validation)
+            logger.warning(
+                "ManagedSnapshotRuntime rejected candidate on attempt %d: %s",
+                attempt,
+                last_error,
+            )
+        raise RuntimeError(
+            "ManagedSnapshotRuntime failed to admit a snapshot after "
+            f"{self.generation_retries} attempt(s): {last_error}"
+        )
+    def _build_context(self) -> BuildContext:
+        seed = self._generation_counter
+        self._generation_counter += 1
+        tier = int(self.manifest.get("tier", 1) or 1)
+        context = self.curriculum.build_context(seed=seed, tier=tier)
+        context.episode_count = self.mutator.episode_count
+        return context
+    def _validate_snapshot(self, snapshot: SnapshotSpec) -> ValidationResult:
+        return _run_coro_sync(self.validator.validate(snapshot, ContainerSet()))
+    @staticmethod
+    def _validation_error(result: ValidationResult) -> str:
+        failed = [check for check in result.checks if not check.passed]
+        if not failed:
+            return "unknown validation failure"
+        payload = [
+            {
+                "name": check.name,
+                "error": check.error,
+                "details": check.details,
+            }
+            for check in failed
+        ]
+        return json.dumps(payload, sort_keys=True)

tests/test_runtime.py ADDED Viewed

	@@ -0,0 +1,118 @@

+"""Tests for the managed snapshot runtime."""
+from __future__ import annotations
+import pytest
+from open_range.server.environment import RangeEnvironment
+from open_range.server.runtime import ManagedSnapshotRuntime
+class TestManagedSnapshotRuntime:
+    def test_start_preloads_snapshot_pool(self, tier1_manifest, tmp_path):
+        runtime = ManagedSnapshotRuntime(
+            manifest=tier1_manifest,
+            store_dir=tmp_path / "snapshots",
+            pool_size=2,
+            refill_enabled=False,
+        )
+        runtime.start()
+        try:
+            listing = runtime.list_snapshots()
+            assert len(listing) == 2
+            assert all(item["snapshot_id"] for item in listing)
+        finally:
+            runtime.stop()
+    def test_acquire_snapshot_returns_admitted_snapshot(self, tier1_manifest, tmp_path):
+        runtime = ManagedSnapshotRuntime(
+            manifest=tier1_manifest,
+            store_dir=tmp_path / "snapshots",
+            pool_size=1,
+            selection_strategy="latest",
+            refill_enabled=False,
+        )
+        runtime.start()
+        try:
+            admitted = runtime.acquire_snapshot()
+            assert admitted.snapshot_id
+            assert admitted.snapshot.truth_graph.vulns
+            assert admitted.snapshot.flags
+        finally:
+            runtime.stop()
+    def test_get_snapshot_by_id_returns_exact_snapshot(self, tier1_manifest, tmp_path):
+        runtime = ManagedSnapshotRuntime(
+            manifest=tier1_manifest,
+            store_dir=tmp_path / "snapshots",
+            pool_size=1,
+            refill_enabled=False,
+        )
+        runtime.start()
+        try:
+            first = runtime.acquire_snapshot()
+            loaded = runtime.get_snapshot(first.snapshot_id)
+            assert loaded.snapshot_id == first.snapshot_id
+            assert loaded.snapshot.flags[0].value == first.snapshot.flags[0].value
+        finally:
+            runtime.stop()
+class TestEnvironmentRuntimeIntegration:
+    def test_reset_uses_managed_runtime_snapshot(self, tier1_manifest, tmp_path):
+        runtime = ManagedSnapshotRuntime(
+            manifest=tier1_manifest,
+            store_dir=tmp_path / "snapshots",
+            pool_size=1,
+            refill_enabled=False,
+        )
+        runtime.start()
+        env = RangeEnvironment(runtime=runtime, docker_available=False)
+        try:
+            obs = env.reset()
+            assert "Range ready" in obs.stdout
+            assert env.snapshot is not None
+            assert env.snapshot.truth_graph.vulns
+        finally:
+            env.close()
+            runtime.stop()
+    def test_reset_snapshot_id_uses_runtime_store(self, tier1_manifest, tmp_path):
+        runtime = ManagedSnapshotRuntime(
+            manifest=tier1_manifest,
+            store_dir=tmp_path / "snapshots",
+            pool_size=1,
+            refill_enabled=False,
+        )
+        runtime.start()
+        env = RangeEnvironment(runtime=runtime, docker_available=False)
+        try:
+            admitted = runtime.acquire_snapshot()
+            env.reset(snapshot_id=admitted.snapshot_id)
+            assert env.snapshot is not None
+            assert env.snapshot.flags[0].value == admitted.snapshot.flags[0].value
+        finally:
+            env.close()
+            runtime.stop()
+    def test_missing_snapshot_id_raises(self, tier1_manifest, tmp_path):
+        runtime = ManagedSnapshotRuntime(
+            manifest=tier1_manifest,
+            store_dir=tmp_path / "snapshots",
+            pool_size=1,
+            refill_enabled=False,
+        )
+        runtime.start()
+        env = RangeEnvironment(runtime=runtime, docker_available=False)
+        try:
+            with pytest.raises(FileNotFoundError):
+                env.reset(snapshot_id="missing_snapshot")
+        finally:
+            env.close()
+            runtime.stop()