Spaces:

abrown31
/

open-range

Runtime error

App Files Files Community

Aaron Brown commited on Mar 7

Commit

cebc7ff

1 Parent(s): 1008330

Add docs and README

Browse files

Files changed (5) hide show

.gitignore +57 -0
README.md +298 -0
docs/architecture.md +155 -0
docs/builder-validator.md +119 -0
docs/openenv-compliance.md +65 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,57 @@

+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+*.egg-info/
+dist/
+build/
+*.egg
+# Virtual environments
+.venv/
+venv/
+env/
+# IDE
+.idea/
+.vscode/
+*.swp
+*.swo
+*~
+# OS
+.DS_Store
+Thumbs.db
+# Docker build outputs (generated ranges)
+outputs/
+# Training outputs
+training/outputs/
+training/checkpoints/
+training/logs/
+wandb/
+*.pt
+*.safetensors
+*.gguf
+# Reward curves
+training/*.png
+# Environment
+.env
+.env.local
+CLAUDE.md
+IMPLEMENTATION_PLAN.md
+# Jupyter
+.ipynb_checkpoints/
+# Test artifacts
+.pytest_cache/
+.coverage
+htmlcov/
+# Pre-validated range pool (generated at startup)
+pool/

README.md ADDED Viewed

	@@ -0,0 +1,298 @@

+# OpenRange
+**Multi-agent cyber gymnasium with real containers, golden-path validation, and self-evolving infrastructure.**
+The first cybersecurity environment in the [OpenEnv](https://github.com/meta-pytorch/OpenEnv) ecosystem.
+---
+## What is this?
+OpenRange drops Red and Blue agents into a **real Docker network** — web apps, databases, firewalls, and all — then lets them fight. An LLM Builder generates the vulnerable infrastructure. A Validator confirms it's actually exploitable. And on every `reset()`, the Builder **mutates** the range with entirely different vulnerabilities, so agents can never memorize their way to victory.
+```
+You write a YAML manifest describing what you want:
+  "2 hosts, DMZ network, web app with database, medium difficulty"
+The Builder LLM generates it:
+  Real nginx + PHP app -> Real MySQL with flags -> Real firewall rules -> Golden path
+The Validator confirms it works:
+  LLM review + 7 scripted checks including inverse mutation testing
+Red attacks. Blue defends. Reset. New vulns. Repeat.
+```
+## Three Roles
+| Role | What it does | Entry point |
+|------|-------------|-------------|
+| **Builder** | Generates and mutates vulnerable infrastructure from YAML manifests | LLM + templates |
+| **Red** | Attacks live containers. Captures flags. | External -- no creds, no access |
+| **Blue** | Defends via log analysis, patching, firewalling. | Internal -- monitor host |
+Red and Blue operate on the **same infrastructure simultaneously**. Red's stealth reward depends on whether Blue catches them. Blue's detection reward depends on Red's actual actions in the logs.
+## Architecture
+```mermaid
+flowchart TD
+    A[YAML Manifest<br/>Human-authored topology + vuln slots] --> B[Builder LLM<br/>Generates configs, plants vulns, writes golden path]
+    B --> C{Hybrid Validator}
+    C -->|Phase A| D[LLM Review<br/>Exploitability, alignment, difficulty]
+    C -->|Phase B| E[7-Check Scripted<br/>Services, flags, isolation,<br/>golden path, inverse mutation]
+    D --> F{PASS?}
+    E --> F
+    F -->|Yes| G[OpenEnv Server<br/>FastAPI: /reset, /step, /state, /ws]
+    F -->|No| B
+    G --> H[Red Agent<br/>nmap, curl, exploit, submit_flag]
+    G --> I[Blue Agent<br/>tail_log, grep, patch, iptables]
+    G --> J[NPC Traffic<br/>Background noise]
+    H --> K[(Docker Containers<br/>web, db, monitor)]
+    I --> K
+    J --> K
+    style A fill:#4a9eff,color:#fff
+    style B fill:#ff6b6b,color:#fff
+    style C fill:#ffd93d,color:#333
+    style G fill:#6bcb77,color:#fff
+    style K fill:#7c73e6,color:#fff
+```
+## Episode Lifecycle
+```mermaid
+sequenceDiagram
+    participant T as Training Loop
+    participant E as OpenEnv Server
+    participant B as Builder LLM
+    participant V as Validator
+    participant C as Containers
+    participant R as Red Agent
+    participant Bl as Blue Agent
+    T->>E: reset()
+    E->>B: Manifest + mutation directive
+    B->>B: Generate structured JSON spec<br/>(vuln type, golden path, flags)
+    B->>C: Render templates -> hot-swap configs
+    C->>C: Restart affected services
+    E->>V: Validate range
+    V->>V: Phase A: LLM review
+    V->>C: Phase B: 7 scripted checks
+    V-->>E: PASS
+    E-->>T: RangeObservation (challenge description)
+    loop Episode Steps (alternating)
+        T->>E: step(Red: nmap -sV web)
+        E->>C: docker exec attacker nmap -sV web
+        C-->>E: stdout: 80/tcp open http
+        E-->>T: RangeObservation(stdout, reward)
+        T->>E: step(Blue: tail_log access.log)
+        E->>C: docker exec monitor tail access.log
+        C-->>E: log entries (Red + NPC mixed)
+        E-->>T: RangeObservation(stdout, reward)
+    end
+    Note over R,Bl: Red stealth reward coupled to Blue detection<br/>Blue detection reward coupled to Red actions
+```
+## Reset = Mutation
+Every call to `reset()` triggers a **mutation** -- the Builder LLM swaps vulnerability classes in the running containers. The topology stays the same, but the challenge is completely different.
+```mermaid
+flowchart LR
+    subgraph Episode 1
+        A1[SQLi in search form] --> F1[Flag in DB]
+    end
+    subgraph Episode 2
+        A2[Command injection<br/>in ping utility] --> F2[Flag on disk]
+    end
+    subgraph Episode 3
+        A3[SSRF -> internal SQLi] --> F3[Flag in internal DB]
+    end
+    Episode 1 -->|reset| Episode 2
+    Episode 2 -->|reset| Episode 3
+    style Episode 1 fill:#ff6b6b22,stroke:#ff6b6b
+    style Episode 2 fill:#ffd93d22,stroke:#ffd93d
+    style Episode 3 fill:#6bcb7722,stroke:#6bcb77
+```
+Agents must **generalize** across vulnerability classes, not memorize exploit chains.
+## Quick Start
+```bash
+# Install
+git clone https://github.com/[team]/open-range.git
+cd open-range
+uv sync --all-extras
+# Run the OpenEnv server locally
+uv run uvicorn server.app:app --host 0.0.0.0 --port 8000
+# Connect a client
+python -c "
+from client import OpenRangeEnv
+from server.models import RangeAction
+with OpenRangeEnv('http://localhost:8000').sync() as env:
+    result = env.reset()
+    print(result.observation.stdout)
+    result = env.step(RangeAction(command='nmap -sV web', mode='red'))
+    print(result.observation.stdout)
+"
+```
+## Reward Signals
+All rewards are **verifiable** -- grounded in real container state, not LLM judgment.
+```mermaid
+flowchart TB
+    subgraph Red Rewards
+        RF[Flag Capture<br/>docker exec cat flag<br/>binary match]
+        RE[Efficiency<br/>gamma^steps]
+        RS[Stealth<br/>Did Blue detect?]
+        RH[Anti-hallucination<br/>-0.3 per fake flag]
+    end
+    subgraph Blue Rewards
+        BD[Detection<br/>TP rate vs Red's log]
+        BP[Patch<br/>Golden path re-run fails]
+        BA[Availability<br/>Healthcheck fraction]
+        BF[False Positive<br/>-0.2 per NPC flagged]
+    end
+    subgraph Coupling
+        RS -.-|depends on| BD
+        BD -.-|depends on| RF
+    end
+    style Red Rewards fill:#ff6b6b11,stroke:#ff6b6b
+    style Blue Rewards fill:#4a9eff11,stroke:#4a9eff
+    style Coupling fill:#ffd93d11,stroke:#ffd93d,stroke-dasharray: 5 5
+```
+## Golden Path Validation
+Every generated range passes a **7-check validation pipeline** before any agent touches it:
+```mermaid
+flowchart LR
+    S1[1. Services up<br/>nc -z ports] --> S2[2. Flags exist<br/>docker exec cat]
+    S2 --> S3[3. Network isolation<br/>external !-> internal]
+    S3 --> S4[4. Golden path<br/>execute exploit steps]
+    S4 --> S5[5. Difficulty<br/>steps within 20%]
+    S5 --> S6[6. No leaks<br/>grep description]
+    S6 --> S7[7. Inverse mutation<br/>revert vuln -> step fails]
+    S7 -->|All pass| PASS[VALID]
+    S7 -->|Any fail| FAIL[RETRY<br/>Builder gets error context]
+    style PASS fill:#6bcb77,color:#fff
+    style FAIL fill:#ff6b6b,color:#fff
+    style S7 fill:#ffd93d,color:#333
+```
+Check 7 is from [Self-Play SWE-RL](https://arxiv.org/abs/2512.18552): it proves each planted vulnerability actually contributes to the challenge.
+## Tier System
+Difficulty grows **horizontally** -- more hosts, more networks, more services. Not just harder passwords.
+```mermaid
+flowchart TD
+    subgraph Tier 1 - Basic
+        W1[web<br/>nginx + PHP] --> D1[db<br/>MySQL]
+    end
+    subgraph Tier 2 - Corporate
+        W2[web] --> D2[db]
+        W2 --> M2[mail<br/>SMTP]
+        FW2[firewall<br/>iptables] --> W2
+    end
+    subgraph Tier 3 - Enterprise
+        W3[web] --> D3[db]
+        W3 --> DC3[DC<br/>LDAP/Kerberos]
+        FS3[files<br/>SMB] --> DC3
+    end
+    style Tier 1 - Basic fill:#6bcb7722,stroke:#6bcb77
+    style Tier 2 - Corporate fill:#ffd93d22,stroke:#ffd93d
+    style Tier 3 - Enterprise fill:#ff6b6b22,stroke:#ff6b6b
+```
+| Tier | Hosts | Networks | Services | Golden Steps |
+|------|-------|----------|----------|--------------|
+| 1 | web + db | dmz | nginx, mysql, sshd | ~8 |
+| 2 | + mail + fw | + internal | + smtp, iptables | ~15 |
+| 3 | + files + DC | + mgmt | + smb, ldap, kerberos | ~25 |
+| 4 | + jump + NPC | all | + bastion, cron, rsync | ~35 |
+| 5 | + honeypot | + trap | + decoys, WAF, IDS | ~50 |
+## Tandem Red + Blue Training
+```mermaid
+sequenceDiagram
+    participant Red as Red Agent<br/>(attacker)
+    participant Env as Range<br/>(containers)
+    participant Blue as Blue Agent<br/>(defender)
+    Note over Red,Blue: Episode begins -- Builder mutated range
+    Red->>Env: nmap -sV web
+    Env-->>Red: 80/tcp open http nginx
+    Note right of Env: Action logged
+    Blue->>Env: tail_log access.log
+    Env-->>Blue: [NPC traffic + Red's scan mixed]
+    Blue->>Env: submit_finding: port scan detected
+    Note left of Blue: True positive!
+    Red->>Env: curl 'web/search?q=' OR 1=1--
+    Env-->>Red: Database results + flag
+    Note right of Env: Action logged
+    Red->>Env: submit_flag FLAG{abc123}
+    Env-->>Red: Correct! reward=1.0
+    Blue->>Env: grep_log "UNION|SELECT|OR 1"
+    Env-->>Blue: SQLi pattern found
+    Blue->>Env: patch search.php (parameterize query)
+    Env-->>Blue: Patch applied
+    Note over Env: Re-run golden path exploit
+    Note over Env: Exploit FAILS -> patch valid
+    Note over Red,Blue: Red stealth: LOW (Blue caught it)<br/>Blue detection: HIGH (found real attack)
+```
+## Project Structure
+```
+open-range/
+├── manifests/          YAML range definitions (topology, vulns, golden paths)
+├── vulns/              Vulnerability catalog (plantable vuln templates)
+├── builder/            Builder LLM + Mutator + rendering templates
+├── validator/          Hybrid validator (LLM review + 7-check scripted)
+├── server/             OpenEnv server (Environment, models, rewards, app.py)
+├── client/             Typed OpenEnv client
+├── docs/               Architecture docs and guides
+├── examples/           Demo scripts
+└── tests/              Test suite
+```
+## Built On
+- [OpenEnv](https://github.com/meta-pytorch/OpenEnv) -- standardized agentic execution environments
+- Lessons from [R2E-Gym](https://arxiv.org/abs/2504.07164) (hybrid verification) and [Self-Play SWE-RL](https://arxiv.org/abs/2512.18552) (formal specs, inverse mutation testing, frontier-calibrating rewards)
+## License
+Apache 2.0

docs/architecture.md ADDED Viewed

	@@ -0,0 +1,155 @@

+# Architecture
+## System Overview
+OpenRange is a 5-layer system. Data flows top-to-bottom during setup, loops during episodes, and feeds back up during curriculum escalation.
+```
+┌─────────────────────────────────────────────────┐
+│                 YAML MANIFEST                   │
+│  Topology, vuln slots, golden path, difficulty  │
+│              (human-authored)                   │
+└──────────────────────┬──────────────────────────┘
+                       │
+                       ▼
+┌─────────────────────────────────────────────────┐
+│              BUILDER LLM                        │
+│  Structured JSON spec → template rendering →    │
+│  Dockerfiles, configs, vulnerable app code,     │
+│  flag placement, golden path, NPC scripts       │
+│  Called on every reset() to MUTATE the range    │
+└──────────────────────┬──────────────────────────┘
+                       │
+                       ▼
+┌─────────────────────────────────────────────────┐
+│           HYBRID VALIDATOR                      │
+│  Phase A: LLM reviews exploitability,           │
+│           alignment, difficulty                 │
+│  Phase B: 7-check scripted execution            │
+│           (services, flags, isolation,          │
+│            golden path, difficulty,             │
+│            leak check, inverse mutation)        │
+│  PASS → proceed    FAIL → Builder retries       │
+└──────────────────────┬──────────────────────────┘
+                       │
+                       ▼
+┌─────────────────────────────────────────────────┐
+│           OPENENV SERVER                        │
+│                                                 │
+│  FastAPI: /reset, /step, /state, /ws            │
+│                                                 │
+│  RangeAction(command, mode) ──────────────────┐ │
+│  RangeObservation(stdout, stderr, reward) ◄───┘ │
+│                                                 │
+│  ┌──────────┐  ┌──────────┐  ┌──────────┐      │
+│  │   RED    │  │   BLUE   │  │   NPC    │      │
+│  │ External │  │ Monitor  │  │ Traffic  │      │
+│  │ attacker │  │ defender │  │ noise    │      │
+│  └──────────┘  └──────────┘  └──────────┘      │
+└──────────────────────┬──────────────────────────┘
+                       │
+                       ▼
+┌─────────────────────────────────────────────────┐
+│         DOCKER CONTAINERS (range)               │
+│                                                 │
+│  ┌────────┐    ┌────────┐    ┌────────┐         │
+│  │  web   │───▶│   db   │    │monitor │         │
+│  │nginx+  │    │ mysql  │    │ logs   │         │
+│  │PHP app │    │ flags  │    │ Blue   │         │
+│  └────────┘    └────────┘    └────────┘         │
+│       DMZ          Internal       Mgmt          │
+└─────────────────────────────────────────────────┘
+```
+## Data Flow
+### Setup (once)
+1. Human writes YAML manifest defining topology + vuln slots
+2. Builder LLM generates initial infrastructure
+3. `docker compose up` starts all containers
+4. Validator confirms range is exploitable and correctly configured
+### Episode Loop
+1. `reset()` → Builder LLM mutates vulns (new class, new flag, new golden path)
+2. Hot-swap configs into running containers, restart affected services
+3. Validator confirms mutation is valid (LLM + 7 scripted checks)
+4. Red and Blue agents interact via `step(RangeAction)`:
+   - Red: executes commands against containers (nmap, curl, ssh, submit_flag)
+   - Blue: reads logs, patches vulns, blocks IPs (tail_log, iptables, patch, submit_finding)
+5. Environment computes rewards from verifiable container state
+6. Episode ends when: flag captured, max steps, timeout, or all vulns patched
+### Curriculum (optional, post-training)
+1. Track Red solve rate and Blue detection rate
+2. Builder LLM adjusts difficulty via `r_inject = 1 - (1+alpha)*s`
+3. When agents plateau: horizontal growth (add hosts, networks, services)
+## Builder LLM Output Format
+The Builder produces **structured JSON**, not prose (lesson from Self-Play SWE-RL):
+```json
+{
+  "vulns": [
+    {
+      "type": "sqli",
+      "host": "web",
+      "injection_point": "/search?q=",
+      "vulnerable_code": "<?php $q=$_GET['q']; $r=mysql_query(\"SELECT * FROM products WHERE name LIKE '%$q%'\"); ?>",
+      "flag_location": "/var/flags/flag1.txt"
+    }
+  ],
+  "flags": [
+    {"id": "flag1", "value": "FLAG{r4nd0m_v4lu3}", "path": "/var/flags/flag1.txt", "host": "web"}
+  ],
+  "golden_path": [
+    {"step": 1, "cmd": "nmap -sV web", "expect_stdout": "80/tcp open http"},
+    {"step": 2, "cmd": "curl http://web/", "expect_stdout": "search"},
+    {"step": 3, "cmd": "curl 'http://web/search?q=test' UNION SELECT flag FROM flags--'", "expect_stdout": "FLAG{"},
+    {"step": 4, "cmd": "submit_flag FLAG{r4nd0m_v4lu3}", "expect_stdout": "correct"}
+  ],
+  "challenge_description": "Target network has a web application with a database backend. Find and exploit vulnerabilities to capture the flag."
+}
+```
+A thin template layer renders this JSON into actual config files (PHP, nginx.conf, init.sql).
+## Hybrid Validation Pipeline
+Two phases, both must pass:
+**Phase A: LLM Review** (fast, catches design-level bugs)
+- Is the vulnerability actually exploitable given these configs?
+- Does the challenge description match without leaking the answer?
+- Is the golden path correct for this vuln type?
+- Is the difficulty right for the current tier?
+**Phase B: 7-Check Scripted Execution** (ground truth, catches implementation bugs)
+1. Services respond on expected ports
+2. Flags exist at expected paths with correct values
+3. Network isolation holds (external can't reach internal)
+4. Golden path commands produce expected outputs
+5. Step count within 20% of difficulty target
+6. Challenge description contains no flag values or exploit details
+7. Inverse mutation test: reverting each vuln breaks its golden path step
+## Reward Architecture
+All rewards implemented as OpenEnv `Rubric` subclasses:
+```
+CompositeRedReward (WeightedSum)
+  ├── FlagReward          binary, docker exec verified
+  ├── EfficiencyReward    gamma^steps
+  ├── StealthReward       coupled to Blue's detection history
+  ├── EvidenceReward      quality of submit_evidence
+  └── HallucinationPenalty  -0.3 per fake flag
+CompositeBlueReward (WeightedSum)
+  ├── DetectionReward     TP rate vs Red's action log
+  ├── PatchReward         binary, golden path re-execution
+  ├── AvailabilityReward  healthcheck fraction
+  └── FalsePositiveReward -0.2 per NPC traffic flagged
+```
+Rewards are computed from **container state and action logs**, never from LLM judgment.

docs/builder-validator.md ADDED Viewed

	@@ -0,0 +1,119 @@

+# Builder + Validator Design
+## Builder LLM
+The Builder generates vulnerable infrastructure from YAML manifests. It's called:
+- Once at startup (initial range creation)
+- On every `reset()` (mutation — swap vulnerability classes)
+### Input
+```yaml
+# From the YAML manifest
+topology:
+  hosts:
+    - name: web
+      zone: dmz
+      services: [nginx, php, sshd]
+    - name: db
+      zone: internal
+      services: [mysql]
+  networks: [dmz, internal]
+difficulty:
+  tier: 1
+  max_steps: 10
+# Plus runtime context
+previous_vuln_classes: [sqli]  # What was planted last episode
+agent_solve_rate: 0.6          # How often Red solves (for difficulty calibration)
+```
+### Output (Structured JSON)
+The Builder outputs a **formal spec**, not prose. Lesson from Self-Play SWE-RL: natural language generation failed with a 32B model. Formal specs are reliable.
+```json
+{
+  "vulns": [{
+    "type": "idor",
+    "host": "web",
+    "injection_point": "/api/user/{id}",
+    "vulnerable_code": "...",
+    "flag_location": "/var/flags/flag1.txt"
+  }],
+  "flags": [{
+    "id": "flag1",
+    "value": "FLAG{abc123}",
+    "path": "/var/flags/flag1.txt",
+    "host": "web"
+  }],
+  "golden_path": [{
+    "step": 1,
+    "cmd": "nmap -sV web",
+    "expect_stdout": "80/tcp open http"
+  }],
+  "challenge_description": "A web application with user management. Find the vulnerability."
+}
+```
+A thin template layer (`builder/templates/`) renders the JSON into actual files.
+### Mutation Strategy
+On `reset()`, the Builder:
+1. Picks a **different** vuln class than the previous episode
+2. Generates new vulnerable code, flag values, and golden path
+3. Renders config files via templates
+4. Hot-swaps into running containers (`docker cp` + service restart)
+5. Does NOT tear down the full stack — partial restart only
+## Validator (Hybrid)
+Validates that the Builder's output actually works. Two phases.
+### Phase A: LLM Review
+The Validator LLM reviews the Builder's structured JSON spec:
+```
+Given this vulnerability specification and these config files:
+1. Is the vulnerability actually exploitable? (Would the golden path commands work?)
+2. Does the challenge description match without revealing the exploit?
+3. Is the golden path complete? (No missing steps?)
+4. Is this the right difficulty for tier {N}?
+Answer YES/NO with brief reasoning for each.
+```
+### Phase B: 7-Check Scripted Execution
+Run against live containers:
+| Check | Command | Pass Condition |
+|-------|---------|----------------|
+| 1. Services | `nc -z web 80` | All ports respond |
+| 2. Flags | `docker exec web cat /var/flags/flag1.txt` | Matches spec value |
+| 3. Isolation | `docker exec attacker curl http://db:3306` | Connection refused |
+| 4. Golden path | Execute each step | `expect_stdout` found in output |
+| 5. Difficulty | `len(golden_path)` | Within 20% of `max_steps` |
+| 6. No leaks | `grep -i "FLAG{" challenge_description` | No matches |
+| 7. Inverse mutation | Revert each vuln, re-run its golden path step | Step FAILS |
+Check 7 (inverse mutation) is the most important. It proves each planted vulnerability is necessary. Without it, the Builder could plant a "decorative" vuln that passes validation but isn't actually the path to the flag.
+### Failure Handling
+```
+Builder generates spec
+  → Validator Phase A (LLM) → FAIL → Builder retries with feedback
+  → Validator Phase B (scripted) → FAIL → Builder retries with error context
+  → 3 failures → Use last known-good configuration
+```
+### Toxic Validation Warning
+R2E-Gym found ~10% of validations incorrectly favor wrong solutions. Track:
+- False-positive rate (accepted broken ranges that don't produce training signal)
+- False-negative rate (rejected valid ranges unnecessarily)
+- Log every validation decision for post-hoc auditing

docs/openenv-compliance.md ADDED Viewed

	@@ -0,0 +1,65 @@

+# OpenEnv Compliance Guide
+OpenRange implements the OpenEnv 0.2.x environment contract. This doc maps every requirement.
+## Checklist
+| Requirement | Status | Implementation |
+|-------------|--------|----------------|
+| `Environment` subclass | Required | `RangeEnvironment(Environment[RangeAction, RangeObservation, RangeState])` |
+| `reset()` returns `ObsT` | Required | Returns `RangeObservation` |
+| `step()` returns `ObsT` | Required | Returns `RangeObservation` |
+| `state` property returns `StateT` | Required | Returns `RangeState` |
+| `Action` subclass (Pydantic, extra=forbid) | Required | `RangeAction(Action)` with `command`, `mode` |
+| `Observation` subclass (Pydantic, extra=forbid) | Required | `RangeObservation(Observation)` — inherits `done`, `reward` from base |
+| `State` subclass (Pydantic, extra=allow) | Required | `RangeState(State)` — inherits `episode_id`, `step_count` from base |
+| `create_app(Class, ActionType, ObsType)` | Required | Pass CLASS not instance |
+| `EnvClient` subclass | Required | `OpenRangeEnv(EnvClient[...])` |
+| `_step_payload()` | Required | Serializes `RangeAction` to dict |
+| `_parse_result()` | Required | Parses server response to `StepResult[RangeObservation]` |
+| `_parse_state()` | Required | Parses server response to `RangeState` |
+| `/health` endpoint | Auto | Provided by `create_app` |
+| `/ws` WebSocket | Auto | Provided by `create_app` |
+| `/reset`, `/step`, `/state` HTTP | Auto | Provided by `create_app` |
+| `Rubric` for rewards | Optional | `CompositeRedReward`, `CompositeBlueReward` as Rubric subclasses |
+| `openenv.yaml` manifest | Required | Environment metadata for HF Spaces |
+| `Dockerfile` | Required | For container deployment |
+## Common Mistakes to Avoid
+1. **Don't redeclare `done` or `reward` on Observation.** The base class already has them.
+2. **Don't redeclare `episode_id` or `step_count` on State.** The base class already has them.
+3. **Pass the CLASS to `create_app()`, not an instance.** Each WebSocket session gets its own instance.
+4. **Action uses `extra="forbid"`.** Unknown fields cause validation errors. Keep actions minimal.
+5. **State uses `extra="allow"`.** You can add any fields you want.
+6. **`reset()` returns ObsT (server-side), `StepResult[ObsT]` (client-side).** The server wraps it.
+## API Signatures (Exact)
+```python
+# Server-side
+class RangeEnvironment(Environment[RangeAction, RangeObservation, RangeState]):
+    def reset(self, seed: Optional[int] = None,
+              episode_id: Optional[str] = None, **kwargs) -> RangeObservation: ...
+    def step(self, action: RangeAction,
+             timeout_s: Optional[float] = None, **kwargs) -> RangeObservation: ...
+    @property
+    def state(self) -> RangeState: ...
+# Client-side
+class OpenRangeEnv(EnvClient[RangeAction, RangeObservation, RangeState]):
+    def _step_payload(self, action: RangeAction) -> dict: ...
+    def _parse_result(self, payload: dict) -> StepResult[RangeObservation]: ...
+    def _parse_state(self, payload: dict) -> RangeState: ...
+# App factory
+app = create_app(RangeEnvironment, RangeAction, RangeObservation, env_name="open_range")
+```
+## Reference Implementations
+Study these OpenEnv environments as patterns:
+- **`envs/coding_env/`** — closest analog (execute code, get stdout/stderr). Uses `Environment` base.
+- **`envs/echo_env/`** — simplest possible environment. Uses `MCPEnvironment` base.
+- **`envs/finqa_env/`** — MCP tool-based with complex rewards. Uses `MCPEnvironment` base.