Spaces:
Runtime error
Runtime error
Aaron Brown commited on
Commit Β·
cebc7ff
1
Parent(s): 1008330
Add docs and README
Browse files- .gitignore +57 -0
- README.md +298 -0
- docs/architecture.md +155 -0
- docs/builder-validator.md +119 -0
- docs/openenv-compliance.md +65 -0
.gitignore
ADDED
|
@@ -0,0 +1,57 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Python
|
| 2 |
+
__pycache__/
|
| 3 |
+
*.py[cod]
|
| 4 |
+
*$py.class
|
| 5 |
+
*.so
|
| 6 |
+
*.egg-info/
|
| 7 |
+
dist/
|
| 8 |
+
build/
|
| 9 |
+
*.egg
|
| 10 |
+
|
| 11 |
+
# Virtual environments
|
| 12 |
+
.venv/
|
| 13 |
+
venv/
|
| 14 |
+
env/
|
| 15 |
+
|
| 16 |
+
# IDE
|
| 17 |
+
.idea/
|
| 18 |
+
.vscode/
|
| 19 |
+
*.swp
|
| 20 |
+
*.swo
|
| 21 |
+
*~
|
| 22 |
+
|
| 23 |
+
# OS
|
| 24 |
+
.DS_Store
|
| 25 |
+
Thumbs.db
|
| 26 |
+
|
| 27 |
+
# Docker build outputs (generated ranges)
|
| 28 |
+
outputs/
|
| 29 |
+
|
| 30 |
+
# Training outputs
|
| 31 |
+
training/outputs/
|
| 32 |
+
training/checkpoints/
|
| 33 |
+
training/logs/
|
| 34 |
+
wandb/
|
| 35 |
+
*.pt
|
| 36 |
+
*.safetensors
|
| 37 |
+
*.gguf
|
| 38 |
+
|
| 39 |
+
# Reward curves
|
| 40 |
+
training/*.png
|
| 41 |
+
|
| 42 |
+
# Environment
|
| 43 |
+
.env
|
| 44 |
+
.env.local
|
| 45 |
+
CLAUDE.md
|
| 46 |
+
IMPLEMENTATION_PLAN.md
|
| 47 |
+
|
| 48 |
+
# Jupyter
|
| 49 |
+
.ipynb_checkpoints/
|
| 50 |
+
|
| 51 |
+
# Test artifacts
|
| 52 |
+
.pytest_cache/
|
| 53 |
+
.coverage
|
| 54 |
+
htmlcov/
|
| 55 |
+
|
| 56 |
+
# Pre-validated range pool (generated at startup)
|
| 57 |
+
pool/
|
README.md
ADDED
|
@@ -0,0 +1,298 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# OpenRange
|
| 2 |
+
|
| 3 |
+
**Multi-agent cyber gymnasium with real containers, golden-path validation, and self-evolving infrastructure.**
|
| 4 |
+
|
| 5 |
+
The first cybersecurity environment in the [OpenEnv](https://github.com/meta-pytorch/OpenEnv) ecosystem.
|
| 6 |
+
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
## What is this?
|
| 10 |
+
|
| 11 |
+
OpenRange drops Red and Blue agents into a **real Docker network** β web apps, databases, firewalls, and all β then lets them fight. An LLM Builder generates the vulnerable infrastructure. A Validator confirms it's actually exploitable. And on every `reset()`, the Builder **mutates** the range with entirely different vulnerabilities, so agents can never memorize their way to victory.
|
| 12 |
+
|
| 13 |
+
```
|
| 14 |
+
You write a YAML manifest describing what you want:
|
| 15 |
+
"2 hosts, DMZ network, web app with database, medium difficulty"
|
| 16 |
+
|
| 17 |
+
The Builder LLM generates it:
|
| 18 |
+
Real nginx + PHP app -> Real MySQL with flags -> Real firewall rules -> Golden path
|
| 19 |
+
|
| 20 |
+
The Validator confirms it works:
|
| 21 |
+
LLM review + 7 scripted checks including inverse mutation testing
|
| 22 |
+
|
| 23 |
+
Red attacks. Blue defends. Reset. New vulns. Repeat.
|
| 24 |
+
```
|
| 25 |
+
|
| 26 |
+
## Three Roles
|
| 27 |
+
|
| 28 |
+
| Role | What it does | Entry point |
|
| 29 |
+
|------|-------------|-------------|
|
| 30 |
+
| **Builder** | Generates and mutates vulnerable infrastructure from YAML manifests | LLM + templates |
|
| 31 |
+
| **Red** | Attacks live containers. Captures flags. | External -- no creds, no access |
|
| 32 |
+
| **Blue** | Defends via log analysis, patching, firewalling. | Internal -- monitor host |
|
| 33 |
+
|
| 34 |
+
Red and Blue operate on the **same infrastructure simultaneously**. Red's stealth reward depends on whether Blue catches them. Blue's detection reward depends on Red's actual actions in the logs.
|
| 35 |
+
|
| 36 |
+
## Architecture
|
| 37 |
+
|
| 38 |
+
```mermaid
|
| 39 |
+
flowchart TD
|
| 40 |
+
A[YAML Manifest<br/>Human-authored topology + vuln slots] --> B[Builder LLM<br/>Generates configs, plants vulns, writes golden path]
|
| 41 |
+
B --> C{Hybrid Validator}
|
| 42 |
+
C -->|Phase A| D[LLM Review<br/>Exploitability, alignment, difficulty]
|
| 43 |
+
C -->|Phase B| E[7-Check Scripted<br/>Services, flags, isolation,<br/>golden path, inverse mutation]
|
| 44 |
+
D --> F{PASS?}
|
| 45 |
+
E --> F
|
| 46 |
+
F -->|Yes| G[OpenEnv Server<br/>FastAPI: /reset, /step, /state, /ws]
|
| 47 |
+
F -->|No| B
|
| 48 |
+
G --> H[Red Agent<br/>nmap, curl, exploit, submit_flag]
|
| 49 |
+
G --> I[Blue Agent<br/>tail_log, grep, patch, iptables]
|
| 50 |
+
G --> J[NPC Traffic<br/>Background noise]
|
| 51 |
+
H --> K[(Docker Containers<br/>web, db, monitor)]
|
| 52 |
+
I --> K
|
| 53 |
+
J --> K
|
| 54 |
+
|
| 55 |
+
style A fill:#4a9eff,color:#fff
|
| 56 |
+
style B fill:#ff6b6b,color:#fff
|
| 57 |
+
style C fill:#ffd93d,color:#333
|
| 58 |
+
style G fill:#6bcb77,color:#fff
|
| 59 |
+
style K fill:#7c73e6,color:#fff
|
| 60 |
+
```
|
| 61 |
+
|
| 62 |
+
## Episode Lifecycle
|
| 63 |
+
|
| 64 |
+
```mermaid
|
| 65 |
+
sequenceDiagram
|
| 66 |
+
participant T as Training Loop
|
| 67 |
+
participant E as OpenEnv Server
|
| 68 |
+
participant B as Builder LLM
|
| 69 |
+
participant V as Validator
|
| 70 |
+
participant C as Containers
|
| 71 |
+
participant R as Red Agent
|
| 72 |
+
participant Bl as Blue Agent
|
| 73 |
+
|
| 74 |
+
T->>E: reset()
|
| 75 |
+
E->>B: Manifest + mutation directive
|
| 76 |
+
B->>B: Generate structured JSON spec<br/>(vuln type, golden path, flags)
|
| 77 |
+
B->>C: Render templates -> hot-swap configs
|
| 78 |
+
C->>C: Restart affected services
|
| 79 |
+
E->>V: Validate range
|
| 80 |
+
V->>V: Phase A: LLM review
|
| 81 |
+
V->>C: Phase B: 7 scripted checks
|
| 82 |
+
V-->>E: PASS
|
| 83 |
+
E-->>T: RangeObservation (challenge description)
|
| 84 |
+
|
| 85 |
+
loop Episode Steps (alternating)
|
| 86 |
+
T->>E: step(Red: nmap -sV web)
|
| 87 |
+
E->>C: docker exec attacker nmap -sV web
|
| 88 |
+
C-->>E: stdout: 80/tcp open http
|
| 89 |
+
E-->>T: RangeObservation(stdout, reward)
|
| 90 |
+
|
| 91 |
+
T->>E: step(Blue: tail_log access.log)
|
| 92 |
+
E->>C: docker exec monitor tail access.log
|
| 93 |
+
C-->>E: log entries (Red + NPC mixed)
|
| 94 |
+
E-->>T: RangeObservation(stdout, reward)
|
| 95 |
+
end
|
| 96 |
+
|
| 97 |
+
Note over R,Bl: Red stealth reward coupled to Blue detection<br/>Blue detection reward coupled to Red actions
|
| 98 |
+
```
|
| 99 |
+
|
| 100 |
+
## Reset = Mutation
|
| 101 |
+
|
| 102 |
+
Every call to `reset()` triggers a **mutation** -- the Builder LLM swaps vulnerability classes in the running containers. The topology stays the same, but the challenge is completely different.
|
| 103 |
+
|
| 104 |
+
```mermaid
|
| 105 |
+
flowchart LR
|
| 106 |
+
subgraph Episode 1
|
| 107 |
+
A1[SQLi in search form] --> F1[Flag in DB]
|
| 108 |
+
end
|
| 109 |
+
subgraph Episode 2
|
| 110 |
+
A2[Command injection<br/>in ping utility] --> F2[Flag on disk]
|
| 111 |
+
end
|
| 112 |
+
subgraph Episode 3
|
| 113 |
+
A3[SSRF -> internal SQLi] --> F3[Flag in internal DB]
|
| 114 |
+
end
|
| 115 |
+
|
| 116 |
+
Episode 1 -->|reset| Episode 2
|
| 117 |
+
Episode 2 -->|reset| Episode 3
|
| 118 |
+
|
| 119 |
+
style Episode 1 fill:#ff6b6b22,stroke:#ff6b6b
|
| 120 |
+
style Episode 2 fill:#ffd93d22,stroke:#ffd93d
|
| 121 |
+
style Episode 3 fill:#6bcb7722,stroke:#6bcb77
|
| 122 |
+
```
|
| 123 |
+
|
| 124 |
+
Agents must **generalize** across vulnerability classes, not memorize exploit chains.
|
| 125 |
+
|
| 126 |
+
## Quick Start
|
| 127 |
+
|
| 128 |
+
```bash
|
| 129 |
+
# Install
|
| 130 |
+
git clone https://github.com/[team]/open-range.git
|
| 131 |
+
cd open-range
|
| 132 |
+
uv sync --all-extras
|
| 133 |
+
|
| 134 |
+
# Run the OpenEnv server locally
|
| 135 |
+
uv run uvicorn server.app:app --host 0.0.0.0 --port 8000
|
| 136 |
+
|
| 137 |
+
# Connect a client
|
| 138 |
+
python -c "
|
| 139 |
+
from client import OpenRangeEnv
|
| 140 |
+
from server.models import RangeAction
|
| 141 |
+
|
| 142 |
+
with OpenRangeEnv('http://localhost:8000').sync() as env:
|
| 143 |
+
result = env.reset()
|
| 144 |
+
print(result.observation.stdout)
|
| 145 |
+
|
| 146 |
+
result = env.step(RangeAction(command='nmap -sV web', mode='red'))
|
| 147 |
+
print(result.observation.stdout)
|
| 148 |
+
"
|
| 149 |
+
```
|
| 150 |
+
|
| 151 |
+
## Reward Signals
|
| 152 |
+
|
| 153 |
+
All rewards are **verifiable** -- grounded in real container state, not LLM judgment.
|
| 154 |
+
|
| 155 |
+
```mermaid
|
| 156 |
+
flowchart TB
|
| 157 |
+
subgraph Red Rewards
|
| 158 |
+
RF[Flag Capture<br/>docker exec cat flag<br/>binary match]
|
| 159 |
+
RE[Efficiency<br/>gamma^steps]
|
| 160 |
+
RS[Stealth<br/>Did Blue detect?]
|
| 161 |
+
RH[Anti-hallucination<br/>-0.3 per fake flag]
|
| 162 |
+
end
|
| 163 |
+
|
| 164 |
+
subgraph Blue Rewards
|
| 165 |
+
BD[Detection<br/>TP rate vs Red's log]
|
| 166 |
+
BP[Patch<br/>Golden path re-run fails]
|
| 167 |
+
BA[Availability<br/>Healthcheck fraction]
|
| 168 |
+
BF[False Positive<br/>-0.2 per NPC flagged]
|
| 169 |
+
end
|
| 170 |
+
|
| 171 |
+
subgraph Coupling
|
| 172 |
+
RS -.-|depends on| BD
|
| 173 |
+
BD -.-|depends on| RF
|
| 174 |
+
end
|
| 175 |
+
|
| 176 |
+
style Red Rewards fill:#ff6b6b11,stroke:#ff6b6b
|
| 177 |
+
style Blue Rewards fill:#4a9eff11,stroke:#4a9eff
|
| 178 |
+
style Coupling fill:#ffd93d11,stroke:#ffd93d,stroke-dasharray: 5 5
|
| 179 |
+
```
|
| 180 |
+
|
| 181 |
+
## Golden Path Validation
|
| 182 |
+
|
| 183 |
+
Every generated range passes a **7-check validation pipeline** before any agent touches it:
|
| 184 |
+
|
| 185 |
+
```mermaid
|
| 186 |
+
flowchart LR
|
| 187 |
+
S1[1. Services up<br/>nc -z ports] --> S2[2. Flags exist<br/>docker exec cat]
|
| 188 |
+
S2 --> S3[3. Network isolation<br/>external !-> internal]
|
| 189 |
+
S3 --> S4[4. Golden path<br/>execute exploit steps]
|
| 190 |
+
S4 --> S5[5. Difficulty<br/>steps within 20%]
|
| 191 |
+
S5 --> S6[6. No leaks<br/>grep description]
|
| 192 |
+
S6 --> S7[7. Inverse mutation<br/>revert vuln -> step fails]
|
| 193 |
+
|
| 194 |
+
S7 -->|All pass| PASS[VALID]
|
| 195 |
+
S7 -->|Any fail| FAIL[RETRY<br/>Builder gets error context]
|
| 196 |
+
|
| 197 |
+
style PASS fill:#6bcb77,color:#fff
|
| 198 |
+
style FAIL fill:#ff6b6b,color:#fff
|
| 199 |
+
style S7 fill:#ffd93d,color:#333
|
| 200 |
+
```
|
| 201 |
+
|
| 202 |
+
Check 7 is from [Self-Play SWE-RL](https://arxiv.org/abs/2512.18552): it proves each planted vulnerability actually contributes to the challenge.
|
| 203 |
+
|
| 204 |
+
## Tier System
|
| 205 |
+
|
| 206 |
+
Difficulty grows **horizontally** -- more hosts, more networks, more services. Not just harder passwords.
|
| 207 |
+
|
| 208 |
+
```mermaid
|
| 209 |
+
flowchart TD
|
| 210 |
+
subgraph Tier 1 - Basic
|
| 211 |
+
W1[web<br/>nginx + PHP] --> D1[db<br/>MySQL]
|
| 212 |
+
end
|
| 213 |
+
|
| 214 |
+
subgraph Tier 2 - Corporate
|
| 215 |
+
W2[web] --> D2[db]
|
| 216 |
+
W2 --> M2[mail<br/>SMTP]
|
| 217 |
+
FW2[firewall<br/>iptables] --> W2
|
| 218 |
+
end
|
| 219 |
+
|
| 220 |
+
subgraph Tier 3 - Enterprise
|
| 221 |
+
W3[web] --> D3[db]
|
| 222 |
+
W3 --> DC3[DC<br/>LDAP/Kerberos]
|
| 223 |
+
FS3[files<br/>SMB] --> DC3
|
| 224 |
+
end
|
| 225 |
+
|
| 226 |
+
style Tier 1 - Basic fill:#6bcb7722,stroke:#6bcb77
|
| 227 |
+
style Tier 2 - Corporate fill:#ffd93d22,stroke:#ffd93d
|
| 228 |
+
style Tier 3 - Enterprise fill:#ff6b6b22,stroke:#ff6b6b
|
| 229 |
+
```
|
| 230 |
+
|
| 231 |
+
| Tier | Hosts | Networks | Services | Golden Steps |
|
| 232 |
+
|------|-------|----------|----------|--------------|
|
| 233 |
+
| 1 | web + db | dmz | nginx, mysql, sshd | ~8 |
|
| 234 |
+
| 2 | + mail + fw | + internal | + smtp, iptables | ~15 |
|
| 235 |
+
| 3 | + files + DC | + mgmt | + smb, ldap, kerberos | ~25 |
|
| 236 |
+
| 4 | + jump + NPC | all | + bastion, cron, rsync | ~35 |
|
| 237 |
+
| 5 | + honeypot | + trap | + decoys, WAF, IDS | ~50 |
|
| 238 |
+
|
| 239 |
+
## Tandem Red + Blue Training
|
| 240 |
+
|
| 241 |
+
```mermaid
|
| 242 |
+
sequenceDiagram
|
| 243 |
+
participant Red as Red Agent<br/>(attacker)
|
| 244 |
+
participant Env as Range<br/>(containers)
|
| 245 |
+
participant Blue as Blue Agent<br/>(defender)
|
| 246 |
+
|
| 247 |
+
Note over Red,Blue: Episode begins -- Builder mutated range
|
| 248 |
+
|
| 249 |
+
Red->>Env: nmap -sV web
|
| 250 |
+
Env-->>Red: 80/tcp open http nginx
|
| 251 |
+
Note right of Env: Action logged
|
| 252 |
+
|
| 253 |
+
Blue->>Env: tail_log access.log
|
| 254 |
+
Env-->>Blue: [NPC traffic + Red's scan mixed]
|
| 255 |
+
Blue->>Env: submit_finding: port scan detected
|
| 256 |
+
Note left of Blue: True positive!
|
| 257 |
+
|
| 258 |
+
Red->>Env: curl 'web/search?q=' OR 1=1--
|
| 259 |
+
Env-->>Red: Database results + flag
|
| 260 |
+
Note right of Env: Action logged
|
| 261 |
+
|
| 262 |
+
Red->>Env: submit_flag FLAG{abc123}
|
| 263 |
+
Env-->>Red: Correct! reward=1.0
|
| 264 |
+
|
| 265 |
+
Blue->>Env: grep_log "UNION|SELECT|OR 1"
|
| 266 |
+
Env-->>Blue: SQLi pattern found
|
| 267 |
+
Blue->>Env: patch search.php (parameterize query)
|
| 268 |
+
Env-->>Blue: Patch applied
|
| 269 |
+
|
| 270 |
+
Note over Env: Re-run golden path exploit
|
| 271 |
+
Note over Env: Exploit FAILS -> patch valid
|
| 272 |
+
|
| 273 |
+
Note over Red,Blue: Red stealth: LOW (Blue caught it)<br/>Blue detection: HIGH (found real attack)
|
| 274 |
+
```
|
| 275 |
+
|
| 276 |
+
## Project Structure
|
| 277 |
+
|
| 278 |
+
```
|
| 279 |
+
open-range/
|
| 280 |
+
βββ manifests/ YAML range definitions (topology, vulns, golden paths)
|
| 281 |
+
βββ vulns/ Vulnerability catalog (plantable vuln templates)
|
| 282 |
+
βββ builder/ Builder LLM + Mutator + rendering templates
|
| 283 |
+
βββ validator/ Hybrid validator (LLM review + 7-check scripted)
|
| 284 |
+
βββ server/ OpenEnv server (Environment, models, rewards, app.py)
|
| 285 |
+
βββ client/ Typed OpenEnv client
|
| 286 |
+
βββ docs/ Architecture docs and guides
|
| 287 |
+
βββ examples/ Demo scripts
|
| 288 |
+
βββ tests/ Test suite
|
| 289 |
+
```
|
| 290 |
+
|
| 291 |
+
## Built On
|
| 292 |
+
|
| 293 |
+
- [OpenEnv](https://github.com/meta-pytorch/OpenEnv) -- standardized agentic execution environments
|
| 294 |
+
- Lessons from [R2E-Gym](https://arxiv.org/abs/2504.07164) (hybrid verification) and [Self-Play SWE-RL](https://arxiv.org/abs/2512.18552) (formal specs, inverse mutation testing, frontier-calibrating rewards)
|
| 295 |
+
|
| 296 |
+
## License
|
| 297 |
+
|
| 298 |
+
Apache 2.0
|
docs/architecture.md
ADDED
|
@@ -0,0 +1,155 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Architecture
|
| 2 |
+
|
| 3 |
+
## System Overview
|
| 4 |
+
|
| 5 |
+
OpenRange is a 5-layer system. Data flows top-to-bottom during setup, loops during episodes, and feeds back up during curriculum escalation.
|
| 6 |
+
|
| 7 |
+
```
|
| 8 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 9 |
+
β YAML MANIFEST β
|
| 10 |
+
β Topology, vuln slots, golden path, difficulty β
|
| 11 |
+
β (human-authored) β
|
| 12 |
+
ββββββββββββββββββββββββ¬βββββββββββββββββββββββββββ
|
| 13 |
+
β
|
| 14 |
+
βΌ
|
| 15 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 16 |
+
β BUILDER LLM β
|
| 17 |
+
β Structured JSON spec β template rendering β β
|
| 18 |
+
β Dockerfiles, configs, vulnerable app code, β
|
| 19 |
+
β flag placement, golden path, NPC scripts β
|
| 20 |
+
β Called on every reset() to MUTATE the range β
|
| 21 |
+
ββββββββββββββββββββββββ¬βββββββββββββββββββββββββββ
|
| 22 |
+
β
|
| 23 |
+
βΌ
|
| 24 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 25 |
+
β HYBRID VALIDATOR β
|
| 26 |
+
β Phase A: LLM reviews exploitability, β
|
| 27 |
+
β alignment, difficulty β
|
| 28 |
+
β Phase B: 7-check scripted execution β
|
| 29 |
+
β (services, flags, isolation, β
|
| 30 |
+
β golden path, difficulty, β
|
| 31 |
+
β leak check, inverse mutation) β
|
| 32 |
+
β PASS β proceed FAIL β Builder retries β
|
| 33 |
+
ββββββββββββββββββββββββ¬βββββββββββββββββββββββββββ
|
| 34 |
+
β
|
| 35 |
+
βΌ
|
| 36 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 37 |
+
β OPENENV SERVER β
|
| 38 |
+
β β
|
| 39 |
+
β FastAPI: /reset, /step, /state, /ws β
|
| 40 |
+
β β
|
| 41 |
+
β RangeAction(command, mode) βββββββββββββββββββ β
|
| 42 |
+
β RangeObservation(stdout, stderr, reward) βββββ β
|
| 43 |
+
β β
|
| 44 |
+
β ββββββββββββ ββββββββββββ ββββββββββββ β
|
| 45 |
+
β β RED β β BLUE β β NPC β β
|
| 46 |
+
β β External β β Monitor β β Traffic β β
|
| 47 |
+
β β attacker β β defender β β noise β β
|
| 48 |
+
β ββββββββββββ ββββββββββββ ββββββββββββ β
|
| 49 |
+
ββββββββββββββββββββββββ¬βββββββββββββββββββββββββββ
|
| 50 |
+
β
|
| 51 |
+
βΌ
|
| 52 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 53 |
+
β DOCKER CONTAINERS (range) β
|
| 54 |
+
β β
|
| 55 |
+
β ββββββββββ ββββββββββ ββββββββββ β
|
| 56 |
+
β β web βββββΆβ db β βmonitor β β
|
| 57 |
+
β βnginx+ β β mysql β β logs β β
|
| 58 |
+
β βPHP app β β flags β β Blue β β
|
| 59 |
+
β ββββββββββ ββββββββββ ββββββββββ β
|
| 60 |
+
β DMZ Internal Mgmt β
|
| 61 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 62 |
+
```
|
| 63 |
+
|
| 64 |
+
## Data Flow
|
| 65 |
+
|
| 66 |
+
### Setup (once)
|
| 67 |
+
1. Human writes YAML manifest defining topology + vuln slots
|
| 68 |
+
2. Builder LLM generates initial infrastructure
|
| 69 |
+
3. `docker compose up` starts all containers
|
| 70 |
+
4. Validator confirms range is exploitable and correctly configured
|
| 71 |
+
|
| 72 |
+
### Episode Loop
|
| 73 |
+
1. `reset()` β Builder LLM mutates vulns (new class, new flag, new golden path)
|
| 74 |
+
2. Hot-swap configs into running containers, restart affected services
|
| 75 |
+
3. Validator confirms mutation is valid (LLM + 7 scripted checks)
|
| 76 |
+
4. Red and Blue agents interact via `step(RangeAction)`:
|
| 77 |
+
- Red: executes commands against containers (nmap, curl, ssh, submit_flag)
|
| 78 |
+
- Blue: reads logs, patches vulns, blocks IPs (tail_log, iptables, patch, submit_finding)
|
| 79 |
+
5. Environment computes rewards from verifiable container state
|
| 80 |
+
6. Episode ends when: flag captured, max steps, timeout, or all vulns patched
|
| 81 |
+
|
| 82 |
+
### Curriculum (optional, post-training)
|
| 83 |
+
1. Track Red solve rate and Blue detection rate
|
| 84 |
+
2. Builder LLM adjusts difficulty via `r_inject = 1 - (1+alpha)*s`
|
| 85 |
+
3. When agents plateau: horizontal growth (add hosts, networks, services)
|
| 86 |
+
|
| 87 |
+
## Builder LLM Output Format
|
| 88 |
+
|
| 89 |
+
The Builder produces **structured JSON**, not prose (lesson from Self-Play SWE-RL):
|
| 90 |
+
|
| 91 |
+
```json
|
| 92 |
+
{
|
| 93 |
+
"vulns": [
|
| 94 |
+
{
|
| 95 |
+
"type": "sqli",
|
| 96 |
+
"host": "web",
|
| 97 |
+
"injection_point": "/search?q=",
|
| 98 |
+
"vulnerable_code": "<?php $q=$_GET['q']; $r=mysql_query(\"SELECT * FROM products WHERE name LIKE '%$q%'\"); ?>",
|
| 99 |
+
"flag_location": "/var/flags/flag1.txt"
|
| 100 |
+
}
|
| 101 |
+
],
|
| 102 |
+
"flags": [
|
| 103 |
+
{"id": "flag1", "value": "FLAG{r4nd0m_v4lu3}", "path": "/var/flags/flag1.txt", "host": "web"}
|
| 104 |
+
],
|
| 105 |
+
"golden_path": [
|
| 106 |
+
{"step": 1, "cmd": "nmap -sV web", "expect_stdout": "80/tcp open http"},
|
| 107 |
+
{"step": 2, "cmd": "curl http://web/", "expect_stdout": "search"},
|
| 108 |
+
{"step": 3, "cmd": "curl 'http://web/search?q=test' UNION SELECT flag FROM flags--'", "expect_stdout": "FLAG{"},
|
| 109 |
+
{"step": 4, "cmd": "submit_flag FLAG{r4nd0m_v4lu3}", "expect_stdout": "correct"}
|
| 110 |
+
],
|
| 111 |
+
"challenge_description": "Target network has a web application with a database backend. Find and exploit vulnerabilities to capture the flag."
|
| 112 |
+
}
|
| 113 |
+
```
|
| 114 |
+
|
| 115 |
+
A thin template layer renders this JSON into actual config files (PHP, nginx.conf, init.sql).
|
| 116 |
+
|
| 117 |
+
## Hybrid Validation Pipeline
|
| 118 |
+
|
| 119 |
+
Two phases, both must pass:
|
| 120 |
+
|
| 121 |
+
**Phase A: LLM Review** (fast, catches design-level bugs)
|
| 122 |
+
- Is the vulnerability actually exploitable given these configs?
|
| 123 |
+
- Does the challenge description match without leaking the answer?
|
| 124 |
+
- Is the golden path correct for this vuln type?
|
| 125 |
+
- Is the difficulty right for the current tier?
|
| 126 |
+
|
| 127 |
+
**Phase B: 7-Check Scripted Execution** (ground truth, catches implementation bugs)
|
| 128 |
+
1. Services respond on expected ports
|
| 129 |
+
2. Flags exist at expected paths with correct values
|
| 130 |
+
3. Network isolation holds (external can't reach internal)
|
| 131 |
+
4. Golden path commands produce expected outputs
|
| 132 |
+
5. Step count within 20% of difficulty target
|
| 133 |
+
6. Challenge description contains no flag values or exploit details
|
| 134 |
+
7. Inverse mutation test: reverting each vuln breaks its golden path step
|
| 135 |
+
|
| 136 |
+
## Reward Architecture
|
| 137 |
+
|
| 138 |
+
All rewards implemented as OpenEnv `Rubric` subclasses:
|
| 139 |
+
|
| 140 |
+
```
|
| 141 |
+
CompositeRedReward (WeightedSum)
|
| 142 |
+
βββ FlagReward binary, docker exec verified
|
| 143 |
+
βββ EfficiencyReward gamma^steps
|
| 144 |
+
βββ StealthReward coupled to Blue's detection history
|
| 145 |
+
βββ EvidenceReward quality of submit_evidence
|
| 146 |
+
βββ HallucinationPenalty -0.3 per fake flag
|
| 147 |
+
|
| 148 |
+
CompositeBlueReward (WeightedSum)
|
| 149 |
+
βββ DetectionReward TP rate vs Red's action log
|
| 150 |
+
βββ PatchReward binary, golden path re-execution
|
| 151 |
+
βββ AvailabilityReward healthcheck fraction
|
| 152 |
+
βββ FalsePositiveReward -0.2 per NPC traffic flagged
|
| 153 |
+
```
|
| 154 |
+
|
| 155 |
+
Rewards are computed from **container state and action logs**, never from LLM judgment.
|
docs/builder-validator.md
ADDED
|
@@ -0,0 +1,119 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Builder + Validator Design
|
| 2 |
+
|
| 3 |
+
## Builder LLM
|
| 4 |
+
|
| 5 |
+
The Builder generates vulnerable infrastructure from YAML manifests. It's called:
|
| 6 |
+
- Once at startup (initial range creation)
|
| 7 |
+
- On every `reset()` (mutation β swap vulnerability classes)
|
| 8 |
+
|
| 9 |
+
### Input
|
| 10 |
+
|
| 11 |
+
```yaml
|
| 12 |
+
# From the YAML manifest
|
| 13 |
+
topology:
|
| 14 |
+
hosts:
|
| 15 |
+
- name: web
|
| 16 |
+
zone: dmz
|
| 17 |
+
services: [nginx, php, sshd]
|
| 18 |
+
- name: db
|
| 19 |
+
zone: internal
|
| 20 |
+
services: [mysql]
|
| 21 |
+
networks: [dmz, internal]
|
| 22 |
+
|
| 23 |
+
difficulty:
|
| 24 |
+
tier: 1
|
| 25 |
+
max_steps: 10
|
| 26 |
+
|
| 27 |
+
# Plus runtime context
|
| 28 |
+
previous_vuln_classes: [sqli] # What was planted last episode
|
| 29 |
+
agent_solve_rate: 0.6 # How often Red solves (for difficulty calibration)
|
| 30 |
+
```
|
| 31 |
+
|
| 32 |
+
### Output (Structured JSON)
|
| 33 |
+
|
| 34 |
+
The Builder outputs a **formal spec**, not prose. Lesson from Self-Play SWE-RL: natural language generation failed with a 32B model. Formal specs are reliable.
|
| 35 |
+
|
| 36 |
+
```json
|
| 37 |
+
{
|
| 38 |
+
"vulns": [{
|
| 39 |
+
"type": "idor",
|
| 40 |
+
"host": "web",
|
| 41 |
+
"injection_point": "/api/user/{id}",
|
| 42 |
+
"vulnerable_code": "...",
|
| 43 |
+
"flag_location": "/var/flags/flag1.txt"
|
| 44 |
+
}],
|
| 45 |
+
"flags": [{
|
| 46 |
+
"id": "flag1",
|
| 47 |
+
"value": "FLAG{abc123}",
|
| 48 |
+
"path": "/var/flags/flag1.txt",
|
| 49 |
+
"host": "web"
|
| 50 |
+
}],
|
| 51 |
+
"golden_path": [{
|
| 52 |
+
"step": 1,
|
| 53 |
+
"cmd": "nmap -sV web",
|
| 54 |
+
"expect_stdout": "80/tcp open http"
|
| 55 |
+
}],
|
| 56 |
+
"challenge_description": "A web application with user management. Find the vulnerability."
|
| 57 |
+
}
|
| 58 |
+
```
|
| 59 |
+
|
| 60 |
+
A thin template layer (`builder/templates/`) renders the JSON into actual files.
|
| 61 |
+
|
| 62 |
+
### Mutation Strategy
|
| 63 |
+
|
| 64 |
+
On `reset()`, the Builder:
|
| 65 |
+
1. Picks a **different** vuln class than the previous episode
|
| 66 |
+
2. Generates new vulnerable code, flag values, and golden path
|
| 67 |
+
3. Renders config files via templates
|
| 68 |
+
4. Hot-swaps into running containers (`docker cp` + service restart)
|
| 69 |
+
5. Does NOT tear down the full stack β partial restart only
|
| 70 |
+
|
| 71 |
+
## Validator (Hybrid)
|
| 72 |
+
|
| 73 |
+
Validates that the Builder's output actually works. Two phases.
|
| 74 |
+
|
| 75 |
+
### Phase A: LLM Review
|
| 76 |
+
|
| 77 |
+
The Validator LLM reviews the Builder's structured JSON spec:
|
| 78 |
+
|
| 79 |
+
```
|
| 80 |
+
Given this vulnerability specification and these config files:
|
| 81 |
+
1. Is the vulnerability actually exploitable? (Would the golden path commands work?)
|
| 82 |
+
2. Does the challenge description match without revealing the exploit?
|
| 83 |
+
3. Is the golden path complete? (No missing steps?)
|
| 84 |
+
4. Is this the right difficulty for tier {N}?
|
| 85 |
+
|
| 86 |
+
Answer YES/NO with brief reasoning for each.
|
| 87 |
+
```
|
| 88 |
+
|
| 89 |
+
### Phase B: 7-Check Scripted Execution
|
| 90 |
+
|
| 91 |
+
Run against live containers:
|
| 92 |
+
|
| 93 |
+
| Check | Command | Pass Condition |
|
| 94 |
+
|-------|---------|----------------|
|
| 95 |
+
| 1. Services | `nc -z web 80` | All ports respond |
|
| 96 |
+
| 2. Flags | `docker exec web cat /var/flags/flag1.txt` | Matches spec value |
|
| 97 |
+
| 3. Isolation | `docker exec attacker curl http://db:3306` | Connection refused |
|
| 98 |
+
| 4. Golden path | Execute each step | `expect_stdout` found in output |
|
| 99 |
+
| 5. Difficulty | `len(golden_path)` | Within 20% of `max_steps` |
|
| 100 |
+
| 6. No leaks | `grep -i "FLAG{" challenge_description` | No matches |
|
| 101 |
+
| 7. Inverse mutation | Revert each vuln, re-run its golden path step | Step FAILS |
|
| 102 |
+
|
| 103 |
+
Check 7 (inverse mutation) is the most important. It proves each planted vulnerability is necessary. Without it, the Builder could plant a "decorative" vuln that passes validation but isn't actually the path to the flag.
|
| 104 |
+
|
| 105 |
+
### Failure Handling
|
| 106 |
+
|
| 107 |
+
```
|
| 108 |
+
Builder generates spec
|
| 109 |
+
β Validator Phase A (LLM) β FAIL β Builder retries with feedback
|
| 110 |
+
β Validator Phase B (scripted) β FAIL β Builder retries with error context
|
| 111 |
+
β 3 failures β Use last known-good configuration
|
| 112 |
+
```
|
| 113 |
+
|
| 114 |
+
### Toxic Validation Warning
|
| 115 |
+
|
| 116 |
+
R2E-Gym found ~10% of validations incorrectly favor wrong solutions. Track:
|
| 117 |
+
- False-positive rate (accepted broken ranges that don't produce training signal)
|
| 118 |
+
- False-negative rate (rejected valid ranges unnecessarily)
|
| 119 |
+
- Log every validation decision for post-hoc auditing
|
docs/openenv-compliance.md
ADDED
|
@@ -0,0 +1,65 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# OpenEnv Compliance Guide
|
| 2 |
+
|
| 3 |
+
OpenRange implements the OpenEnv 0.2.x environment contract. This doc maps every requirement.
|
| 4 |
+
|
| 5 |
+
## Checklist
|
| 6 |
+
|
| 7 |
+
| Requirement | Status | Implementation |
|
| 8 |
+
|-------------|--------|----------------|
|
| 9 |
+
| `Environment` subclass | Required | `RangeEnvironment(Environment[RangeAction, RangeObservation, RangeState])` |
|
| 10 |
+
| `reset()` returns `ObsT` | Required | Returns `RangeObservation` |
|
| 11 |
+
| `step()` returns `ObsT` | Required | Returns `RangeObservation` |
|
| 12 |
+
| `state` property returns `StateT` | Required | Returns `RangeState` |
|
| 13 |
+
| `Action` subclass (Pydantic, extra=forbid) | Required | `RangeAction(Action)` with `command`, `mode` |
|
| 14 |
+
| `Observation` subclass (Pydantic, extra=forbid) | Required | `RangeObservation(Observation)` β inherits `done`, `reward` from base |
|
| 15 |
+
| `State` subclass (Pydantic, extra=allow) | Required | `RangeState(State)` β inherits `episode_id`, `step_count` from base |
|
| 16 |
+
| `create_app(Class, ActionType, ObsType)` | Required | Pass CLASS not instance |
|
| 17 |
+
| `EnvClient` subclass | Required | `OpenRangeEnv(EnvClient[...])` |
|
| 18 |
+
| `_step_payload()` | Required | Serializes `RangeAction` to dict |
|
| 19 |
+
| `_parse_result()` | Required | Parses server response to `StepResult[RangeObservation]` |
|
| 20 |
+
| `_parse_state()` | Required | Parses server response to `RangeState` |
|
| 21 |
+
| `/health` endpoint | Auto | Provided by `create_app` |
|
| 22 |
+
| `/ws` WebSocket | Auto | Provided by `create_app` |
|
| 23 |
+
| `/reset`, `/step`, `/state` HTTP | Auto | Provided by `create_app` |
|
| 24 |
+
| `Rubric` for rewards | Optional | `CompositeRedReward`, `CompositeBlueReward` as Rubric subclasses |
|
| 25 |
+
| `openenv.yaml` manifest | Required | Environment metadata for HF Spaces |
|
| 26 |
+
| `Dockerfile` | Required | For container deployment |
|
| 27 |
+
|
| 28 |
+
## Common Mistakes to Avoid
|
| 29 |
+
|
| 30 |
+
1. **Don't redeclare `done` or `reward` on Observation.** The base class already has them.
|
| 31 |
+
2. **Don't redeclare `episode_id` or `step_count` on State.** The base class already has them.
|
| 32 |
+
3. **Pass the CLASS to `create_app()`, not an instance.** Each WebSocket session gets its own instance.
|
| 33 |
+
4. **Action uses `extra="forbid"`.** Unknown fields cause validation errors. Keep actions minimal.
|
| 34 |
+
5. **State uses `extra="allow"`.** You can add any fields you want.
|
| 35 |
+
6. **`reset()` returns ObsT (server-side), `StepResult[ObsT]` (client-side).** The server wraps it.
|
| 36 |
+
|
| 37 |
+
## API Signatures (Exact)
|
| 38 |
+
|
| 39 |
+
```python
|
| 40 |
+
# Server-side
|
| 41 |
+
class RangeEnvironment(Environment[RangeAction, RangeObservation, RangeState]):
|
| 42 |
+
def reset(self, seed: Optional[int] = None,
|
| 43 |
+
episode_id: Optional[str] = None, **kwargs) -> RangeObservation: ...
|
| 44 |
+
def step(self, action: RangeAction,
|
| 45 |
+
timeout_s: Optional[float] = None, **kwargs) -> RangeObservation: ...
|
| 46 |
+
@property
|
| 47 |
+
def state(self) -> RangeState: ...
|
| 48 |
+
|
| 49 |
+
# Client-side
|
| 50 |
+
class OpenRangeEnv(EnvClient[RangeAction, RangeObservation, RangeState]):
|
| 51 |
+
def _step_payload(self, action: RangeAction) -> dict: ...
|
| 52 |
+
def _parse_result(self, payload: dict) -> StepResult[RangeObservation]: ...
|
| 53 |
+
def _parse_state(self, payload: dict) -> RangeState: ...
|
| 54 |
+
|
| 55 |
+
# App factory
|
| 56 |
+
app = create_app(RangeEnvironment, RangeAction, RangeObservation, env_name="open_range")
|
| 57 |
+
```
|
| 58 |
+
|
| 59 |
+
## Reference Implementations
|
| 60 |
+
|
| 61 |
+
Study these OpenEnv environments as patterns:
|
| 62 |
+
|
| 63 |
+
- **`envs/coding_env/`** β closest analog (execute code, get stdout/stderr). Uses `Environment` base.
|
| 64 |
+
- **`envs/echo_env/`** β simplest possible environment. Uses `MCPEnvironment` base.
|
| 65 |
+
- **`envs/finqa_env/`** β MCP tool-based with complex rewards. Uses `MCPEnvironment` base.
|