jtowarek Claude Sonnet 4.6 commited on
Commit
5d2f027
·
0 Parent(s):

Add KantBench HF Space code to spaces/kant/

Browse files

Pulls the live HF Space (jtowarek/KantBench) into the monorepo so it's
tracked in version control. The space is a self-contained standalone
deployment of the KantBench game-theory environment using openenv-core.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Dockerfile ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ # Multi-stage build using openenv-base
8
+ # This Dockerfile is flexible and works for both:
9
+ # - In-repo environments (with local OpenEnv sources)
10
+ # - Standalone environments (with openenv from PyPI/Git)
11
+ # The build script (openenv build) handles context detection and sets appropriate build args.
12
+
13
+ ARG BASE_IMAGE=ghcr.io/meta-pytorch/openenv-base:latest
14
+ FROM ${BASE_IMAGE} AS builder
15
+
16
+ WORKDIR /app
17
+
18
+ # Ensure git is available (required for installing dependencies from VCS)
19
+ RUN apt-get update && \
20
+ apt-get install -y --no-install-recommends git && \
21
+ rm -rf /var/lib/apt/lists/*
22
+
23
+ # Build argument to control whether we're building standalone or in-repo
24
+ ARG BUILD_MODE=in-repo
25
+ ARG ENV_NAME=KantBench
26
+
27
+ # Copy environment code (always at root of build context)
28
+ COPY . /app/env
29
+
30
+ # For in-repo builds, openenv is already vendored in the build context
31
+ # For standalone builds, openenv will be installed via pyproject.toml
32
+ WORKDIR /app/env
33
+
34
+ # Ensure uv is available (for local builds where base image lacks it)
35
+ RUN if ! command -v uv >/dev/null 2>&1; then \
36
+ curl -LsSf https://astral.sh/uv/install.sh | sh && \
37
+ mv /root/.local/bin/uv /usr/local/bin/uv && \
38
+ mv /root/.local/bin/uvx /usr/local/bin/uvx; \
39
+ fi
40
+
41
+ # Install dependencies using uv sync
42
+ # If uv.lock exists, use it; otherwise resolve on the fly
43
+ RUN --mount=type=cache,target=/root/.cache/uv \
44
+ if [ -f uv.lock ]; then \
45
+ uv sync --frozen --no-install-project --no-editable; \
46
+ else \
47
+ uv sync --no-install-project --no-editable; \
48
+ fi
49
+
50
+ RUN --mount=type=cache,target=/root/.cache/uv \
51
+ if [ -f uv.lock ]; then \
52
+ uv sync --frozen --no-editable; \
53
+ else \
54
+ uv sync --no-editable; \
55
+ fi
56
+
57
+ # Final runtime stage
58
+ FROM ${BASE_IMAGE}
59
+
60
+ WORKDIR /app
61
+
62
+ # Copy the virtual environment from builder
63
+ COPY --from=builder /app/env/.venv /app/.venv
64
+
65
+ # Copy the environment code
66
+ COPY --from=builder /app/env /app/env
67
+
68
+ # Set PATH to use the virtual environment
69
+ ENV PATH="/app/.venv/bin:$PATH"
70
+
71
+ # Set PYTHONPATH so imports work correctly
72
+ ENV PYTHONPATH="/app/env:$PYTHONPATH"
73
+
74
+ # Health check
75
+ HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
76
+ CMD curl -f http://localhost:8000/health || exit 1
77
+
78
+ # Run the FastAPI server
79
+ # The module path is constructed to work with the /app/env structure
80
+ ENV ENABLE_WEB_INTERFACE=true
81
+ CMD ["sh", "-c", "cd /app/env && uvicorn server.app:app --host 0.0.0.0 --port 8000"]
README.md ADDED
@@ -0,0 +1,255 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Kantbench Environment Server
3
+ emoji: 🕹️
4
+ colorFrom: green
5
+ colorTo: yellow
6
+ sdk: docker
7
+ pinned: false
8
+ app_port: 8000
9
+ base_path: /web
10
+ tags:
11
+ - openenv
12
+ ---
13
+
14
+ # Kantbench Environment
15
+
16
+ A simple test environment that echoes back messages. Perfect for testing the env APIs as well as demonstrating environment usage patterns.
17
+
18
+ ## Quick Start
19
+
20
+ The simplest way to use the Kantbench environment is through the `KantbenchEnv` class:
21
+
22
+ ```python
23
+ from KantBench import KantbenchAction, KantbenchEnv
24
+
25
+ try:
26
+ # Create environment from Docker image
27
+ KantBenchenv = KantbenchEnv.from_docker_image("KantBench-env:latest")
28
+
29
+ # Reset
30
+ result = KantBenchenv.reset()
31
+ print(f"Reset: {result.observation.echoed_message}")
32
+
33
+ # Send multiple messages
34
+ messages = ["Hello, World!", "Testing echo", "Final message"]
35
+
36
+ for msg in messages:
37
+ result = KantBenchenv.step(KantbenchAction(message=msg))
38
+ print(f"Sent: '{msg}'")
39
+ print(f" → Echoed: '{result.observation.echoed_message}'")
40
+ print(f" → Length: {result.observation.message_length}")
41
+ print(f" → Reward: {result.reward}")
42
+
43
+ finally:
44
+ # Always clean up
45
+ KantBenchenv.close()
46
+ ```
47
+
48
+ That's it! The `KantbenchEnv.from_docker_image()` method handles:
49
+ - Starting the Docker container
50
+ - Waiting for the server to be ready
51
+ - Connecting to the environment
52
+ - Container cleanup when you call `close()`
53
+
54
+ ## Building the Docker Image
55
+
56
+ Before using the environment, you need to build the Docker image:
57
+
58
+ ```bash
59
+ # From project root
60
+ docker build -t KantBench-env:latest -f server/Dockerfile .
61
+ ```
62
+
63
+ ## Deploying to Hugging Face Spaces
64
+
65
+ You can easily deploy your OpenEnv environment to Hugging Face Spaces using the `openenv push` command:
66
+
67
+ ```bash
68
+ # From the environment directory (where openenv.yaml is located)
69
+ openenv push
70
+
71
+ # Or specify options
72
+ openenv push --namespace my-org --private
73
+ ```
74
+
75
+ The `openenv push` command will:
76
+ 1. Validate that the directory is an OpenEnv environment (checks for `openenv.yaml`)
77
+ 2. Prepare a custom build for Hugging Face Docker space (enables web interface)
78
+ 3. Upload to Hugging Face (ensuring you're logged in)
79
+
80
+ ### Prerequisites
81
+
82
+ - Authenticate with Hugging Face: The command will prompt for login if not already authenticated
83
+
84
+ ### Options
85
+
86
+ - `--directory`, `-d`: Directory containing the OpenEnv environment (defaults to current directory)
87
+ - `--repo-id`, `-r`: Repository ID in format 'username/repo-name' (defaults to 'username/env-name' from openenv.yaml)
88
+ - `--base-image`, `-b`: Base Docker image to use (overrides Dockerfile FROM)
89
+ - `--private`: Deploy the space as private (default: public)
90
+
91
+ ### Examples
92
+
93
+ ```bash
94
+ # Push to your personal namespace (defaults to username/env-name from openenv.yaml)
95
+ openenv push
96
+
97
+ # Push to a specific repository
98
+ openenv push --repo-id my-org/my-env
99
+
100
+ # Push with a custom base image
101
+ openenv push --base-image ghcr.io/meta-pytorch/openenv-base:latest
102
+
103
+ # Push as a private space
104
+ openenv push --private
105
+
106
+ # Combine options
107
+ openenv push --repo-id my-org/my-env --base-image custom-base:latest --private
108
+ ```
109
+
110
+ After deployment, your space will be available at:
111
+ `https://huggingface.co/spaces/<repo-id>`
112
+
113
+ The deployed space includes:
114
+ - **Web Interface** at `/web` - Interactive UI for exploring the environment
115
+ - **API Documentation** at `/docs` - Full OpenAPI/Swagger interface
116
+ - **Health Check** at `/health` - Container health monitoring
117
+ - **WebSocket** at `/ws` - Persistent session endpoint for low-latency interactions
118
+
119
+ ## Environment Details
120
+
121
+ ### Action
122
+ **KantbenchAction**: Contains a single field
123
+ - `message` (str) - The message to echo back
124
+
125
+ ### Observation
126
+ **KantbenchObservation**: Contains the echo response and metadata
127
+ - `echoed_message` (str) - The message echoed back
128
+ - `message_length` (int) - Length of the message
129
+ - `reward` (float) - Reward based on message length (length × 0.1)
130
+ - `done` (bool) - Always False for echo environment
131
+ - `metadata` (dict) - Additional info like step count
132
+
133
+ ### Reward
134
+ The reward is calculated as: `message_length × 0.1`
135
+ - "Hi" → reward: 0.2
136
+ - "Hello, World!" → reward: 1.3
137
+ - Empty message → reward: 0.0
138
+
139
+ ## Advanced Usage
140
+
141
+ ### Connecting to an Existing Server
142
+
143
+ If you already have a Kantbench environment server running, you can connect directly:
144
+
145
+ ```python
146
+ from KantBench import KantbenchEnv
147
+
148
+ # Connect to existing server
149
+ KantBenchenv = KantbenchEnv(base_url="<ENV_HTTP_URL_HERE>")
150
+
151
+ # Use as normal
152
+ result = KantBenchenv.reset()
153
+ result = KantBenchenv.step(KantbenchAction(message="Hello!"))
154
+ ```
155
+
156
+ Note: When connecting to an existing server, `KantBenchenv.close()` will NOT stop the server.
157
+
158
+ ### Using the Context Manager
159
+
160
+ The client supports context manager usage for automatic connection management:
161
+
162
+ ```python
163
+ from KantBench import KantbenchAction, KantbenchEnv
164
+
165
+ # Connect with context manager (auto-connects and closes)
166
+ with KantbenchEnv(base_url="http://localhost:8000") as env:
167
+ result = env.reset()
168
+ print(f"Reset: {result.observation.echoed_message}")
169
+ # Multiple steps with low latency
170
+ for msg in ["Hello", "World", "!"]:
171
+ result = env.step(KantbenchAction(message=msg))
172
+ print(f"Echoed: {result.observation.echoed_message}")
173
+ ```
174
+
175
+ The client uses WebSocket connections for:
176
+ - **Lower latency**: No HTTP connection overhead per request
177
+ - **Persistent session**: Server maintains your environment state
178
+ - **Efficient for episodes**: Better for many sequential steps
179
+
180
+ ### Concurrent WebSocket Sessions
181
+
182
+ The server supports multiple concurrent WebSocket connections. To enable this,
183
+ modify `server/app.py` to use factory mode:
184
+
185
+ ```python
186
+ # In server/app.py - use factory mode for concurrent sessions
187
+ app = create_app(
188
+ KantbenchEnvironment, # Pass class, not instance
189
+ KantbenchAction,
190
+ KantbenchObservation,
191
+ max_concurrent_envs=4, # Allow 4 concurrent sessions
192
+ )
193
+ ```
194
+
195
+ Then multiple clients can connect simultaneously:
196
+
197
+ ```python
198
+ from KantBench import KantbenchAction, KantbenchEnv
199
+ from concurrent.futures import ThreadPoolExecutor
200
+
201
+ def run_episode(client_id: int):
202
+ with KantbenchEnv(base_url="http://localhost:8000") as env:
203
+ result = env.reset()
204
+ for i in range(10):
205
+ result = env.step(KantbenchAction(message=f"Client {client_id}, step {i}"))
206
+ return client_id, result.observation.message_length
207
+
208
+ # Run 4 episodes concurrently
209
+ with ThreadPoolExecutor(max_workers=4) as executor:
210
+ results = list(executor.map(run_episode, range(4)))
211
+ ```
212
+
213
+ ## Development & Testing
214
+
215
+ ### Direct Environment Testing
216
+
217
+ Test the environment logic directly without starting the HTTP server:
218
+
219
+ ```bash
220
+ # From the server directory
221
+ python3 server/KantBench_environment.py
222
+ ```
223
+
224
+ This verifies that:
225
+ - Environment resets correctly
226
+ - Step executes actions properly
227
+ - State tracking works
228
+ - Rewards are calculated correctly
229
+
230
+ ### Running Locally
231
+
232
+ Run the server locally for development:
233
+
234
+ ```bash
235
+ uvicorn server.app:app --reload
236
+ ```
237
+
238
+ ## Project Structure
239
+
240
+ ```
241
+ KantBench/
242
+ ├── .dockerignore # Docker build exclusions
243
+ ├── __init__.py # Module exports
244
+ ├── README.md # This file
245
+ ├── openenv.yaml # OpenEnv manifest
246
+ ├── pyproject.toml # Project metadata and dependencies
247
+ ├── uv.lock # Locked dependencies (generated)
248
+ ├── client.py # KantbenchEnv client
249
+ ├── models.py # Action and Observation models
250
+ └── server/
251
+ ├── __init__.py # Server module exports
252
+ ├── KantBench_environment.py # Core environment logic
253
+ ├── app.py # FastAPI application (HTTP + WebSocket endpoints)
254
+ └── Dockerfile # Container image definition
255
+ ```
__init__.py ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """Kantbench Environment."""
8
+
9
+ from .client import KantbenchEnv
10
+ from .models import KantbenchAction, KantbenchObservation
11
+
12
+ __all__ = [
13
+ "KantbenchAction",
14
+ "KantbenchObservation",
15
+ "KantbenchEnv",
16
+ ]
client.py ADDED
@@ -0,0 +1,99 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """Kantbench Environment Client."""
8
+
9
+ from typing import Dict
10
+
11
+ from openenv.core.client_types import StepResult
12
+ from openenv.core.env_server.types import State
13
+ from openenv.core import EnvClient
14
+
15
+ from .models import KantbenchAction, KantbenchObservation
16
+
17
+
18
+ class KantbenchEnv(
19
+ EnvClient[KantbenchAction, KantbenchObservation]
20
+ ):
21
+ """
22
+ Client for the Kantbench Environment.
23
+
24
+ This client maintains a persistent WebSocket connection to the environment server,
25
+ enabling efficient multi-step interactions with lower latency.
26
+ Each client instance has its own dedicated environment session on the server.
27
+
28
+ Example:
29
+ >>> # Connect to a running server
30
+ >>> with KantbenchEnv(base_url="http://localhost:8000") as client:
31
+ ... result = client.reset()
32
+ ... print(result.observation.echoed_message)
33
+ ...
34
+ ... result = client.step(KantbenchAction(message="Hello!"))
35
+ ... print(result.observation.echoed_message)
36
+
37
+ Example with Docker:
38
+ >>> # Automatically start container and connect
39
+ >>> client = KantbenchEnv.from_docker_image("KantBench-env:latest")
40
+ >>> try:
41
+ ... result = client.reset()
42
+ ... result = client.step(KantbenchAction(message="Test"))
43
+ ... finally:
44
+ ... client.close()
45
+ """
46
+
47
+ def _step_payload(self, action: KantbenchAction) -> Dict:
48
+ """
49
+ Convert KantbenchAction to JSON payload for step message.
50
+
51
+ Args:
52
+ action: KantbenchAction instance
53
+
54
+ Returns:
55
+ Dictionary representation suitable for JSON encoding
56
+ """
57
+ return {
58
+ "message": action.message,
59
+ }
60
+
61
+ def _parse_result(self, payload: Dict) -> StepResult[KantbenchObservation]:
62
+ """
63
+ Parse server response into StepResult[KantbenchObservation].
64
+
65
+ Args:
66
+ payload: JSON response data from server
67
+
68
+ Returns:
69
+ StepResult with KantbenchObservation
70
+ """
71
+ obs_data = payload.get("observation", {})
72
+ observation = KantbenchObservation(
73
+ echoed_message=obs_data.get("echoed_message", ""),
74
+ message_length=obs_data.get("message_length", 0),
75
+ done=payload.get("done", False),
76
+ reward=payload.get("reward"),
77
+ metadata=obs_data.get("metadata", {}),
78
+ )
79
+
80
+ return StepResult(
81
+ observation=observation,
82
+ reward=payload.get("reward"),
83
+ done=payload.get("done", False),
84
+ )
85
+
86
+ def _parse_state(self, payload: Dict) -> State:
87
+ """
88
+ Parse server response into State object.
89
+
90
+ Args:
91
+ payload: JSON response from state request
92
+
93
+ Returns:
94
+ State object with episode_id and step_count
95
+ """
96
+ return State(
97
+ episode_id=payload.get("episode_id"),
98
+ step_count=payload.get("step_count", 0),
99
+ )
models.py ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Data models for the KantBench game theory environment."""
2
+
3
+ from typing import Any
4
+ from pydantic import Field
5
+ from openenv.core.env_server.types import Action, Observation
6
+
7
+
8
+ class KantBenchAction(Action):
9
+ """Action for the KantBench environment — a move in a 2-player game."""
10
+
11
+ move: str = Field(..., description="Your move (e.g. 'cooperate', 'defect', 'hawk', 'dove')")
12
+
13
+
14
+ class KantBenchObservation(Observation):
15
+ """Observation from the KantBench environment after one round."""
16
+
17
+ game_name: str = Field(default="", description="Name of the current game")
18
+ game_description: str = Field(default="", description="Description of the game")
19
+ available_moves: list[str] = Field(default_factory=list, description="Valid moves for this game")
20
+ your_move: str = Field(default="", description="Your move this round")
21
+ opponent_move: str = Field(default="", description="Opponent's move this round")
22
+ your_payoff: float = Field(default=0.0, description="Your payoff this round")
23
+ opponent_payoff: float = Field(default=0.0, description="Opponent's payoff this round")
24
+ cumulative_score: float = Field(default=0.0, description="Your total score so far")
25
+ round_number: int = Field(default=0, description="Current round number")
26
+ max_rounds: int = Field(default=10, description="Total rounds in this episode")
27
+ opponent_strategy: str = Field(default="", description="Opponent's strategy name")
28
+ history: list[dict[str, Any]] = Field(default_factory=list, description="Round history")
29
+ message: str = Field(default="", description="Status message")
openenv.yaml ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ spec_version: 1
2
+ name: KantBench
3
+ type: space
4
+ runtime: fastapi
5
+ app: server.app:app
6
+ port: 8000
7
+
pyproject.toml ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ [build-system]
8
+ requires = ["setuptools>=45", "wheel"]
9
+ build-backend = "setuptools.build_meta"
10
+
11
+ [project]
12
+ name = "openenv-KantBench"
13
+ version = "0.1.0"
14
+ description = "Kantbench environment for OpenEnv"
15
+ requires-python = ">=3.10"
16
+ dependencies = [
17
+ # Core OpenEnv runtime (provides FastAPI server + HTTP client types)
18
+ # install from github
19
+ # "openenv-core[core] @ git+https://github.com/meta-pytorch/OpenEnv.git",
20
+ "openenv-core[core]>=0.2.0",
21
+ # Environment-specific dependencies
22
+ # Add all dependencies needed for your environment here
23
+ # Examples:
24
+ # "numpy>=1.19.0",
25
+ # "torch>=2.0.0",
26
+ # "gymnasium>=0.29.0",
27
+ # "openspiel>=1.0.0",
28
+ # "smolagents>=1.22.0,<2",
29
+ ]
30
+
31
+ [project.optional-dependencies]
32
+ dev = [
33
+ "pytest>=8.0.0",
34
+ "pytest-cov>=4.0.0",
35
+ ]
36
+
37
+ [project.scripts]
38
+ # Server entry point - enables running via: uv run --project . server
39
+ # or: python -m KantBench.server.app
40
+ server = "KantBench.server.app:main"
41
+
42
+ [tool.setuptools]
43
+ include-package-data = true
44
+ packages = ["KantBench", "KantBench.server"]
45
+ package-dir = { "KantBench" = ".", "KantBench.server" = "server" }
server/KantBench_environment.py ADDED
@@ -0,0 +1,289 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """KantBench: a game theory RL environment for OpenEnv.
2
+
3
+ Each episode is one repeated game (e.g. Prisoner's Dilemma) against a
4
+ fixed strategy opponent. The agent chooses a move each round; the
5
+ environment computes payoffs and returns a structured observation.
6
+
7
+ Supported games: Prisoner's Dilemma, Stag Hunt, Hawk-Dove,
8
+ Battle of Sexes, Chicken, Matching Pennies,
9
+ Rock-Paper-Scissors.
10
+ Opponent strategies: random, always_first, always_last, tit_for_tat,
11
+ grim_trigger, pavlov.
12
+ """
13
+
14
+ from __future__ import annotations
15
+
16
+ import random
17
+ from typing import Any
18
+ from uuid import uuid4
19
+
20
+ from openenv.core.env_server.interfaces import Environment
21
+ from openenv.core.env_server.types import State
22
+
23
+ from models import KantBenchAction, KantBenchObservation
24
+
25
+ # ---------------------------------------------------------------------------
26
+ # Game definitions (self-contained payoff matrices)
27
+ # ---------------------------------------------------------------------------
28
+
29
+ def _matrix(m: dict[tuple[str, str], tuple[float, float]]):
30
+ """Return a payoff function from a matrix dict."""
31
+ def fn(a: str, b: str) -> tuple[float, float]:
32
+ return m[(a, b)]
33
+ return fn
34
+
35
+
36
+ GAMES: dict[str, dict[str, Any]] = {
37
+ "prisoners_dilemma": {
38
+ "name": "Prisoner's Dilemma",
39
+ "description": (
40
+ "Two players choose to cooperate or defect simultaneously. "
41
+ "Mutual cooperation is best collectively; defection is individually tempting."
42
+ ),
43
+ "actions": ["cooperate", "defect"],
44
+ "rounds": 10,
45
+ "payoff_fn": _matrix({
46
+ ("cooperate", "cooperate"): (3.0, 3.0),
47
+ ("cooperate", "defect"): (0.0, 5.0),
48
+ ("defect", "cooperate"): (5.0, 0.0),
49
+ ("defect", "defect"): (1.0, 1.0),
50
+ }),
51
+ },
52
+ "stag_hunt": {
53
+ "name": "Stag Hunt",
54
+ "description": (
55
+ "Two hunters choose to hunt stag (requires coordination) or hare "
56
+ "(safe alone). Mutual cooperation yields the best outcome."
57
+ ),
58
+ "actions": ["stag", "hare"],
59
+ "rounds": 10,
60
+ "payoff_fn": _matrix({
61
+ ("stag", "stag"): (4.0, 4.0),
62
+ ("stag", "hare"): (0.0, 2.0),
63
+ ("hare", "stag"): (2.0, 0.0),
64
+ ("hare", "hare"): (2.0, 2.0),
65
+ }),
66
+ },
67
+ "hawk_dove": {
68
+ "name": "Hawk-Dove",
69
+ "description": (
70
+ "Two players compete over a resource. Hawk is aggressive; Dove is passive. "
71
+ "Two hawks fight and both lose; two doves share."
72
+ ),
73
+ "actions": ["hawk", "dove"],
74
+ "rounds": 10,
75
+ "payoff_fn": _matrix({
76
+ ("hawk", "hawk"): (-1.0, -1.0),
77
+ ("hawk", "dove"): (4.0, 0.0),
78
+ ("dove", "hawk"): (0.0, 4.0),
79
+ ("dove", "dove"): (2.0, 2.0),
80
+ }),
81
+ },
82
+ "battle_of_sexes": {
83
+ "name": "Battle of the Sexes",
84
+ "description": (
85
+ "Two players want to coordinate but prefer different options. "
86
+ "Player 1 prefers opera; Player 2 prefers football. "
87
+ "Both prefer to be together over going alone."
88
+ ),
89
+ "actions": ["opera", "football"],
90
+ "rounds": 10,
91
+ "payoff_fn": _matrix({
92
+ ("opera", "opera"): (3.0, 1.0),
93
+ ("opera", "football"): (0.0, 0.0),
94
+ ("football", "opera"): (0.0, 0.0),
95
+ ("football", "football"): (1.0, 3.0),
96
+ }),
97
+ },
98
+ "chicken": {
99
+ "name": "Chicken (Snowdrift)",
100
+ "description": (
101
+ "Two drivers head toward each other. Swerving is safe but cowardly; "
102
+ "going straight is bold but catastrophic if both do it."
103
+ ),
104
+ "actions": ["straight", "swerve"],
105
+ "rounds": 10,
106
+ "payoff_fn": _matrix({
107
+ ("straight", "straight"): (-10.0, -10.0),
108
+ ("straight", "swerve"): (5.0, -1.0),
109
+ ("swerve", "straight"): (-1.0, 5.0),
110
+ ("swerve", "swerve"): (0.0, 0.0),
111
+ }),
112
+ },
113
+ "matching_pennies": {
114
+ "name": "Matching Pennies",
115
+ "description": (
116
+ "Player 1 wins if both show the same side; Player 2 wins if they differ. "
117
+ "Pure zero-sum game with no stable pure-strategy Nash equilibrium."
118
+ ),
119
+ "actions": ["heads", "tails"],
120
+ "rounds": 20,
121
+ "payoff_fn": _matrix({
122
+ ("heads", "heads"): (1.0, -1.0),
123
+ ("heads", "tails"): (-1.0, 1.0),
124
+ ("tails", "heads"): (-1.0, 1.0),
125
+ ("tails", "tails"): (1.0, -1.0),
126
+ }),
127
+ },
128
+ "rock_paper_scissors": {
129
+ "name": "Rock-Paper-Scissors",
130
+ "description": (
131
+ "Classic zero-sum game. Rock beats Scissors, Scissors beats Paper, "
132
+ "Paper beats Rock. Ties yield 0."
133
+ ),
134
+ "actions": ["rock", "paper", "scissors"],
135
+ "rounds": 20,
136
+ "payoff_fn": _matrix({
137
+ ("rock", "rock"): (0.0, 0.0),
138
+ ("rock", "paper"): (-1.0, 1.0),
139
+ ("rock", "scissors"): (1.0, -1.0),
140
+ ("paper", "rock"): (1.0, -1.0),
141
+ ("paper", "paper"): (0.0, 0.0),
142
+ ("paper", "scissors"): (-1.0, 1.0),
143
+ ("scissors", "rock"): (-1.0, 1.0),
144
+ ("scissors", "paper"): (1.0, -1.0),
145
+ ("scissors", "scissors"): (0.0, 0.0),
146
+ }),
147
+ },
148
+ }
149
+
150
+ STRATEGIES = ["random", "always_first", "always_last", "tit_for_tat", "grim_trigger", "pavlov"]
151
+
152
+
153
+ def _opponent_move(strategy: str, actions: list[str], history: list[dict]) -> str:
154
+ """Compute opponent's move given strategy and history."""
155
+ if strategy == "random":
156
+ return random.choice(actions)
157
+ if strategy == "always_first":
158
+ return actions[0]
159
+ if strategy == "always_last":
160
+ return actions[-1]
161
+ if not history:
162
+ return actions[0] # default opening for reactive strategies
163
+ last_agent_move = history[-1]["your_move"]
164
+ last_opp_move = history[-1]["opponent_move"]
165
+ if strategy == "tit_for_tat":
166
+ return last_agent_move if last_agent_move in actions else actions[0]
167
+ if strategy == "grim_trigger":
168
+ # Defect forever once agent defects; cooperate otherwise
169
+ ever_defected = any(r["your_move"] == actions[-1] for r in history)
170
+ return actions[-1] if ever_defected else actions[0]
171
+ if strategy == "pavlov":
172
+ # Repeat own last move if it paid well (i.e. opponent cooperated), else switch
173
+ if last_opp_move == actions[0]:
174
+ return last_opp_move # mirror cooperation
175
+ return actions[0] if last_opp_move != actions[0] else actions[-1]
176
+ return random.choice(actions)
177
+
178
+
179
+ # ---------------------------------------------------------------------------
180
+ # Environment
181
+ # ---------------------------------------------------------------------------
182
+
183
+ class KantbenchEnvironment(Environment):
184
+ """Game theory environment for benchmarking LLM strategic reasoning.
185
+
186
+ Each episode is a repeated 2-player game against one of six opponent
187
+ strategies. The agent submits a move each round and receives the payoff
188
+ result as a structured observation.
189
+
190
+ Example::
191
+
192
+ env = KantBenchEnvironment()
193
+ obs = env.reset()
194
+ # obs.game_name == "Prisoner's Dilemma"
195
+ # obs.available_moves == ["cooperate", "defect"]
196
+
197
+ obs = env.step(KantBenchAction(move="cooperate"))
198
+ # obs.your_payoff, obs.opponent_move, obs.cumulative_score, ...
199
+ """
200
+
201
+ SUPPORTS_CONCURRENT_SESSIONS: bool = True
202
+
203
+ def __init__(self):
204
+ self._state = State(episode_id=str(uuid4()), step_count=0)
205
+ self._game_key: str = "prisoners_dilemma"
206
+ self._strategy: str = "random"
207
+ self._history: list[dict] = []
208
+ self._cumulative_score: float = 0.0
209
+
210
+ def reset(self, **kwargs) -> KantBenchObservation:
211
+ """Start a new episode with a randomly chosen game and opponent strategy."""
212
+ self._game_key = random.choice(list(GAMES.keys()))
213
+ self._strategy = random.choice(STRATEGIES)
214
+ self._history = []
215
+ self._cumulative_score = 0.0
216
+ self._state = State(episode_id=str(uuid4()), step_count=0)
217
+
218
+ game = GAMES[self._game_key]
219
+ return KantBenchObservation(
220
+ game_name=game["name"],
221
+ game_description=game["description"],
222
+ available_moves=game["actions"],
223
+ your_move="",
224
+ opponent_move="",
225
+ your_payoff=0.0,
226
+ opponent_payoff=0.0,
227
+ cumulative_score=0.0,
228
+ round_number=0,
229
+ max_rounds=game["rounds"],
230
+ opponent_strategy=self._strategy,
231
+ history=[],
232
+ done=False,
233
+ reward=0.0,
234
+ message=(
235
+ f"New episode: {game['name']} vs {self._strategy}. "
236
+ f"Choose one of: {game['actions']}"
237
+ ),
238
+ )
239
+
240
+ def step(self, action: KantBenchAction, **kwargs) -> KantBenchObservation: # type: ignore[override]
241
+ """Play one round of the current game."""
242
+ game = GAMES[self._game_key]
243
+ actions = game["actions"]
244
+ max_rounds = game["rounds"]
245
+
246
+ # Validate move
247
+ move = action.move.lower().strip()
248
+ if move not in actions:
249
+ closest = actions[0]
250
+ move = closest
251
+
252
+ opp_move = _opponent_move(self._strategy, actions, self._history)
253
+ your_pay, opp_pay = game["payoff_fn"](move, opp_move)
254
+
255
+ self._state.step_count += 1
256
+ self._cumulative_score += your_pay
257
+
258
+ round_record = {
259
+ "round": self._state.step_count,
260
+ "your_move": move,
261
+ "opponent_move": opp_move,
262
+ "your_payoff": your_pay,
263
+ "opponent_payoff": opp_pay,
264
+ }
265
+ self._history.append(round_record)
266
+
267
+ done = self._state.step_count >= max_rounds
268
+
269
+ return KantBenchObservation(
270
+ game_name=game["name"],
271
+ game_description=game["description"],
272
+ available_moves=actions,
273
+ your_move=move,
274
+ opponent_move=opp_move,
275
+ your_payoff=your_pay,
276
+ opponent_payoff=opp_pay,
277
+ cumulative_score=self._cumulative_score,
278
+ round_number=self._state.step_count,
279
+ max_rounds=max_rounds,
280
+ opponent_strategy=self._strategy,
281
+ history=self._history,
282
+ done=done,
283
+ reward=your_pay,
284
+ message="Game over — call reset() to start a new episode." if done else "",
285
+ )
286
+
287
+ @property
288
+ def state(self) -> State:
289
+ return self._state
server/__init__.py ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """Kantbench environment server components."""
8
+
9
+ from .KantBench_environment import KantbenchEnvironment
10
+
11
+ __all__ = ["KantbenchEnvironment"]
server/app.py ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """
8
+ FastAPI application for the Kantbench Environment.
9
+
10
+ This module creates an HTTP server that exposes the KantbenchEnvironment
11
+ over HTTP and WebSocket endpoints, compatible with EnvClient.
12
+
13
+ Endpoints:
14
+ - POST /reset: Reset the environment
15
+ - POST /step: Execute an action
16
+ - GET /state: Get current environment state
17
+ - GET /schema: Get action/observation schemas
18
+ - WS /ws: WebSocket endpoint for persistent sessions
19
+
20
+ Usage:
21
+ # Development (with auto-reload):
22
+ uvicorn server.app:app --reload --host 0.0.0.0 --port 8000
23
+
24
+ # Production:
25
+ uvicorn server.app:app --host 0.0.0.0 --port 8000 --workers 4
26
+
27
+ # Or run directly:
28
+ python -m server.app
29
+ """
30
+
31
+ try:
32
+ from openenv.core.env_server.http_server import create_app
33
+ except Exception as e: # pragma: no cover
34
+ raise ImportError(
35
+ "openenv is required for the web interface. Install dependencies with '\n uv sync\n'"
36
+ ) from e
37
+
38
+ # Import from local models.py (PYTHONPATH includes /app/env in Docker)
39
+ from models import KantBenchAction, KantBenchObservation
40
+ from .KantBench_environment import KantbenchEnvironment
41
+
42
+
43
+ # Create the app with web interface and README integration
44
+ app = create_app(
45
+ KantbenchEnvironment,
46
+ KantBenchAction,
47
+ KantBenchObservation,
48
+ env_name="KantBench",
49
+ max_concurrent_envs=1, # increase this number to allow more concurrent WebSocket sessions
50
+ )
51
+
52
+
53
+ def main(host: str = "0.0.0.0", port: int = 8000):
54
+ """
55
+ Entry point for direct execution via uv run or python -m.
56
+
57
+ This function enables running the server without Docker:
58
+ uv run --project . server
59
+ uv run --project . server --port 8001
60
+ python -m KantBench.server.app
61
+
62
+ Args:
63
+ host: Host address to bind to (default: "0.0.0.0")
64
+ port: Port number to listen on (default: 8000)
65
+
66
+ For production deployments, consider using uvicorn directly with
67
+ multiple workers:
68
+ uvicorn KantBench.server.app:app --workers 4
69
+ """
70
+ import uvicorn
71
+
72
+ uvicorn.run(app, host=host, port=port)
73
+
74
+
75
+ if __name__ == "__main__":
76
+ import argparse
77
+
78
+ parser = argparse.ArgumentParser()
79
+ parser.add_argument("--port", type=int, default=8000)
80
+ args = parser.parse_args()
81
+ main(port=args.port)
server/requirements.txt ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ openenv[core]>=0.2.0
2
+ fastapi>=0.115.0
3
+ uvicorn>=0.24.0
4
+
5
+
6
+