File size: 11,144 Bytes
7529adc
3ea4118
 
 
 
7529adc
3ea4118
7529adc
3ea4118
 
 
 
7529adc
 
cebc7ff
 
4a77f25
cebc7ff
 
 
4a77f25
cebc7ff
80ef9e0
cebc7ff
 
200a872
f549fda
 
 
80ef9e0
f549fda
 
 
4a77f25
5ef50e5
4a77f25
 
5ef50e5
 
4a77f25
cebc7ff
 
 
 
fda4cbc
 
 
 
 
 
 
cebc7ff
fb68239
 
 
906af9d
 
 
 
4a77f25
8c486a8
cebc7ff
fb68239
 
 
 
 
 
f016eb7
 
 
 
 
 
 
 
 
fda4cbc
 
 
 
 
 
 
 
 
 
819cfef
4a77f25
 
819cfef
 
4a77f25
cebc7ff
f549fda
cebc7ff
80ef9e0
906af9d
80ef9e0
cebc7ff
fda4cbc
 
b33db9f
a72929a
b33db9f
a72929a
b33db9f
 
 
 
 
 
 
 
ecc152d
cebc7ff
906af9d
8c486a8
4a77f25
8c486a8
4a77f25
 
 
 
 
 
8c486a8
f016eb7
 
 
 
 
 
 
 
 
 
 
 
4a77f25
8c486a8
fb68239
 
8c486a8
 
 
 
 
 
 
 
 
cebc7ff
8c486a8
4a77f25
8c486a8
4a77f25
cebc7ff
4a77f25
 
 
 
 
8c486a8
4a77f25
8c486a8
4a77f25
 
 
 
 
 
 
 
8c486a8
7fedc25
8c486a8
4a77f25
8c486a8
4a77f25
 
 
fb68239
4a77f25
 
8c486a8
cebc7ff
 
4a77f25
 
cebc7ff
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
---
title: OpenRange Environment Server
emoji: 🎯
colorFrom: red
colorTo: blue
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
  - openenv
  - rl-environment
---

# OpenRange

A multi-agent cybersecurity gymnasium on [OpenEnv](https://github.com/meta-pytorch/OpenEnv). Red and Blue agents train on validated enterprise networks that mutate between episodes.

---

## How It Works

A **manifest** declares a family of legal enterprise worlds β€” topology, services, identities, trust relationships, vulnerability classes, and mutation bounds. A shared **ManagedSnapshotRuntime** inside the shipped OpenEnv server process owns the admitted snapshot population. It compiles a graph-friendly root snapshot from the manifest, normalizing trust-only principals into a canonical principal catalog, then derives child snapshots by applying explicit typed mutations to admitted parents. Parent selection is policy-driven over the admitted population rather than raw latest/random sampling. Each candidate child is validated in layers: manifest compliance, canonical graph checks, structural/task checks, and, in managed-generation mode, booted runtime checks before admission. `reset()` selects one frozen admitted snapshot. `step()` runs commands inside it.

```mermaid
flowchart LR
    M[Manifest<br/>legal family +<br/>mutation envelope] --> B[Base snapshot compiler]
    B --> P[Admitted root snapshot]
    P --> R[ManagedSnapshotRuntime<br/>shared inside server process]
    R --> U[Policy-guided parent selector +<br/>typed mutator]
    U --> V{Validator<br/>manifest + graph +<br/>runtime checks}
    V -->|fail| U
    V -->|pass| S[Admitted snapshot population]
    S --> E["reset() β†’ step() β†’ obs + reward"]

    style V fill:#ffd93d,color:#333
    style S fill:#6bcb77,color:#fff
```

Red and Blue operate on the **same infrastructure simultaneously**. Red's stealth reward depends on whether Blue catches them. Blue's detection reward depends on Red's actual actions in the logs. This coupling drives co-evolution.

## Quick Start

```bash
# Install
git clone https://github.com/open-cybernauts/open-range.git
cd open-range
uv sync

# Optional: enable the LiteLLM-backed builder pipeline
uv sync --extra builder

# Optional: enable LiteLLM-backed synthetic teacher agents
uv sync --extra synthetic

# Optional: enable background refill inside the server
export OPENRANGE_ENABLE_MANAGED_REFILL=1
export OPENRANGE_RUNTIME_BUILDER=llm

# End-to-end demo (no Docker, no LLM)
uv run python examples/demo.py

# Generate synthetic SFT traces from a snapshot or manifest
uv run openrange synthetic-data \
  --manifest manifests/tier1_basic.yaml \
  --output data/sft_red.jsonl \
  --roles red

# Merge local bootstrap traces and tool context into generated output
uv run openrange synthetic-data \
  --manifest manifests/tier1_basic.yaml \
  --output data/synthetic_sft_5.jsonl \
  --num-traces 5 \
  --roles red \
  --bootstrap-traces data/sft.jsonl \
  --tool-info data/tool_info.md

# Run the OpenEnv client against a running server
uv run python examples/remote_client_demo.py --base-url http://localhost:8000

# Run the FastAPI server
uv run server                                   # default: 127.0.0.1:8000
uv run server --port 9000                       # custom port
uv run server --host 0.0.0.0                    # bind all interfaces

# Or via uvicorn directly
uv run uvicorn server.app:app --host 0.0.0.0 --port 8000 --reload

# Tests
uv run pytest tests/ -v --tb=short
```

## Core Components

**Manifest** β€” YAML defining the legal world family and mutation envelope: hosts, zones, services, users, NPCs, data assets, credential policies, monitoring coverage, trust relationships, and which vulnerability classes the runtime may plant or extend. Three example manifests ship (healthcare, fintech, SaaS) at tiers 1-3.

**ManagedSnapshotRuntime** β€” Shared singleton created at server startup. Owns the `SnapshotStore`, base builder, population-aware parent selector, parent-snapshot mutator, validator gate, `SnapshotRenderer`, snapshot preload, optional background refill, and episode-result feedback. This is the hidden orchestrator behind the env; callers still only see `reset()`, `step()`, and `state()`.

**Builder / Mutator** β€” The base builder compiles an initial `SnapshotSpec` from a manifest. Root hydration then expands that into canonical topology state: host details, dependency edges, trust edges, and a principal catalog that can represent trust-only people without inventing login accounts. The mutator derives child `SnapshotSpec`s from admitted parents using typed mutation plans plus an explicit mutation-policy layer that scores parents and candidate edits with curriculum, replay, novelty, and lineage signals. Each snapshot carries lineage metadata (`snapshot_id`, `parent_snapshot_id`, `root_snapshot_id`, generation depth, mutation summary) and can emit constrained service/app payloads through `SnapshotSpec.files`. Three base builders ship: `LLMSnapshotBuilder` (production, via litellm), `TemplateOnlyBuilder` (deterministic shipped default), `FileBuilder` (load from disk).

The deployed package exposes the standard OpenEnv `reset()`, `step()`, and `state()` contract through `server.app:app`, which is the entrypoint referenced by `openenv.yaml`.

**Validator** β€” Admission gate for candidate snapshots. The shipped runtime enforces manifest compliance plus graph-native checks such as graph consistency, path solvability, evidence sufficiency, and reward grounding before structural/task checks. With the `training` profile, the runtime boots rendered bundles, applies payload files, constructs a real `ContainerSet`, and runs live build/exploit/patch/evidence/reward/isolation/difficulty/NPC/realism checks before admission.

Validator profile matrix:

| Profile | Checks | Guarantees |
|---------|--------|------------|
| `offline` | Graph + structural/task checks only (no live containers) | Fast static admission only; no live exploitability/patchability guarantee |
| `training` | `offline` checks + live/container-backed checks | Full admission guarantees for managed training/runtime use |

Managed runtime defaults and safety behavior:
- `OPENRANGE_RUNTIME_VALIDATOR_PROFILE` defaults to `training`.
- `OPENRANGE_ENABLE_LIVE_ADMISSION` defaults to `1`.
- If managed runtime is configured non-live (`offline` profile and/or live admission disabled), startup raises an error unless you explicitly opt out with `OPENRANGE_ALLOW_NON_LIVE_ADMISSION=1` (legacy alias: `OPENRANGE_ALLOW_OFFLINE_ADMISSION=1`), in which case a warning is emitted.

**Environment** β€” `RangeEnvironment(Environment)` following the OpenEnv contract. `reset()` asks the shared runtime for a frozen admitted snapshot. `step(action)` routes commands to the appropriate container β€” Red runs on the attacker box, Blue runs on the SIEM. No artificial command allowlists; the container's installed tools are the constraint.

**Rewards** β€” All grounded in container state, not LLM judgment:

| Red | Blue |
|-----|------|
| Flag capture (binary, `docker exec cat`) | Detection (TP rate vs Red's log) |
| Efficiency (`gamma^steps`) | Patch validity (re-run exploit, must fail) |
| Stealth (inversely coupled to Blue detection) | Availability (healthcheck fraction) |
| Anti-hallucination (-0.3 per fake flag) | False positive penalty (-0.2 per NPC flagged) |

**NPC Traffic** β€” Background noise and social engineering surface. Two levels:

- **Level 0** (shell scripts): `http_traffic.sh`, `db_traffic.sh`, `ssh_traffic.sh` generate benign traffic that Blue must filter from real attacks. Scripts discover targets dynamically (available pages, databases, tables) β€” no hardcoded endpoints.
- **Level 1** (LLM agents): Each NPC persona runs an autonomous workday via LiteLLM β€” browsing pages, sending emails, querying databases, accessing file shares. NPCs also react to incoming stimuli (phishing emails) based on their `security_awareness` profile.

All NPC actions are derived from the `SnapshotSpec` at runtime (pages, shares, tables, credentials, domain), so they generalize to any Builder-generated environment. NPC logs carry structured fields (`type`, `label`, `source`, `result`) that couple directly to Red/Blue reward signals.

Configure the NPC model via environment variable:
```bash
export OPENRANGE_NPC_MODEL="azure/gpt-5.2-codex"  # or openai/gpt-4o, anthropic/claude-haiku-4-5-20251001, ollama/llama3
```

**Agents** β€” Structural protocol: any object with `reset(briefing, role)` and `act(observation) -> command` works. Ships with `LLMRangeAgent` (litellm, any provider), `ScriptedAgent`, and `HumanAgent`.

**Synthetic Data** β€” `open_range.training.synthetic` provides snapshot-grounded trajectory generation for SFT warm-start. It uses a fast simulated `RangeEnvironment`, optional LiteLLM teacher agents, per-episode flag randomization, and exports JSONL through `TrajectoryLogger`.

```python
from open_range.agents.episode import run_episode
from open_range.agents.llm_agent import LLMRangeAgent
from open_range.server.environment import RangeEnvironment

env = RangeEnvironment()
red = LLMRangeAgent(model="anthropic/claude-sonnet-4-20250514")
blue = LLMRangeAgent(model="openai/gpt-4o")
result = run_episode(env, red, blue, max_steps=50)
```

## Tier System

Difficulty grows horizontally β€” more hosts, zones, and chained attack surface. Not just harder passwords.

| Tier | Scale | Example |
|------|-------|---------|
| 1 | 6-8 hosts, 3-4 zones | Healthcare clinic: web + DB + mail + LDAP + SIEM |
| 2 | 10-12 hosts, 5-6 zones | Financial firm: + VPN, internal APIs, certificate authority |
| 3 | 14-18 hosts, 7-8 zones | SaaS company: + CI/CD, container registry, partner extranet |

## Server Endpoints

| Method | Path | Description |
|--------|------|-------------|
| GET | `/health` | Liveness check |
| GET | `/metadata` | Environment name, version |
| POST | `/reset` | Start episode, returns initial observation |
| POST | `/step` | Execute action, returns observation + reward |
| GET | `/state` | Current episode state |
| WS | `/ws` | WebSocket session |

Built directly on the OpenEnv HTTP/WebSocket contract.

## Docs

- [Architecture](docs/architecture.md) β€” full pipeline, network topology, episode lifecycle
- [Builder & Validator](docs/builder-validator.md) β€” snapshot generation and admission
- [Red & Blue Agents](docs/red-blue-agents.md) β€” tandem training, reward coupling, curriculum
- [Synthetic Data](docs/synthetic-data.md) β€” snapshot-backed SFT trace generation with LiteLLM teachers
- [Agent Protocols](docs/agent-protocols.md) β€” agent interface, episode runner, evaluation
- [OpenEnv Compliance](docs/openenv-compliance.md) β€” API contract, models, deployment

## Built On

- [OpenEnv](https://github.com/meta-pytorch/OpenEnv) β€” standardized agentic execution environments
- Ideas from [R2E-Gym](https://arxiv.org/abs/2504.07164) (hybrid verification), [Self-Play SWE-RL](https://arxiv.org/abs/2512.18552) (formal specs, inverse mutation), PAIRED/UED (constrained generation), POET (mutate + admit)

## License

Apache 2.0