Spaces:
Sleeping
Sleeping
File size: 6,053 Bytes
85ff496 4ded5ed 85ff496 6b42632 4ded5ed 6b42632 85ff496 4ded5ed a3f3034 4ded5ed d145b94 a3f3034 d145b94 6b42632 4ded5ed d145b94 a3f3034 d145b94 6b42632 4ded5ed d145b94 a3f3034 6b42632 4ded5ed 6b42632 4ded5ed 6b42632 4ded5ed 6b42632 4ded5ed 6b42632 4ded5ed 6b42632 4ded5ed 6b42632 4ded5ed 6b42632 4ded5ed 6b42632 4ded5ed 6b42632 4ded5ed 6b42632 4ded5ed 6b42632 4ded5ed 6b42632 4ded5ed 6b42632 4ded5ed 6b42632 4ded5ed 6b42632 4ded5ed | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 | ---
title: Coding Environment Server
emoji: π»
colorFrom: blue
colorTo: blue
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
- openenv
---
# Coding Environment
A real-world **PR triage and code review** environment with three graded tasks
(easy/medium/hard). Each episode presents pull request metadata and a unified
diff, then asks the agent to submit a structured review.
## Quick Start
The simplest way to use the Coding environment is through the `CodingEnv` class. The client is **async by default**:
```python
import asyncio
from coding_env import CodeAction, CodingEnv
async def main():
# Create environment from Docker image
client = await CodingEnv.from_docker_image("coding-env:latest")
async with client:
# Reset
result = await client.reset()
print(f"Reset complete: exit_code={result.observation.exit_code}")
# Execute Python code
code_samples = [
"print('Hello, World!')",
"x = 5 + 3\nprint(f'Result: {x}')",
"import math\nprint(math.pi)"
]
for code in code_samples:
result = await client.step(CodeAction(code=code))
print(f"Code: {code}")
print(f" β stdout: {result.observation.stdout.strip()}")
print(f" β exit_code: {result.observation.exit_code}")
asyncio.run(main())
```
For **synchronous usage**, use the `.sync()` wrapper:
```python
from coding_env import CodeAction, CodingEnv
with CodingEnv(base_url="http://localhost:8000").sync() as client:
result = client.reset()
result = client.step(CodeAction(code="print('Hello!')"))
print(result.observation.stdout)
```
The `CodingEnv.from_docker_image()` method handles:
- Starting the Docker container
- Waiting for the server to be ready
- Connecting to the environment
- Container cleanup when the context manager exits
## Building the Docker Image
Before using the environment, you need to build the Docker image:
```bash
# From project root
docker build -t coding-env:latest -f envs/coding_env/server/Dockerfile .
```
## Environment Details
### Action
**CodeAction** fields:
- `review` (str) - Human-readable review summary
- `file_path` (str) - Changed file being flagged
- `issue_type` (str) - `logic|security|performance|maintainability`
- `severity` (str) - `low|medium|high|critical`
- `bug_type` (str) - One of `syntax | logic | security | none`
- `line_number` (int) - Suspected faulty line
- `confidence` (float) - Confidence score in `[0.0, 1.0]`
### Observation
**CodeObservation** fields:
- `task_id` (str) - Current task id
- `difficulty` (str) - Task difficulty (`easy|medium|hard`)
- `task_description` (str) - Review instructions
- `code_snippet` (str) - PR context + unified diff
- `pr_title` (str) - Pull request title
- `pr_description` (str) - Pull request summary
- `changed_files` (str) - Changed file list
- `previous_feedback` (str) - Grader feedback from latest step
- `reward` (float) - Normalized score contribution `[0.0, 1.0]`
- `done` (bool) - Episode termination flag
### State
**CodeState**: Tracks execution state
- `episode_id` (str) - Unique identifier for the episode
- `step_count` (int) - Number of steps taken
- `task_id` (str) - Active task id
- `difficulty` (str) - Active task difficulty
- `last_score` (float) - Last normalized score
## Built-in Tasks and Graders
The server exposes:
- `GET /tasks` to list all benchmark tasks.
- `GET /grader?task_id=<id>&episode_id=<id>` to read final normalized score.
Shipped tasks:
- `task_easy_1` (logic)
- `task_medium_1` (security)
- `task_hard_1` (logic/performance-concurrency)
Rewards are strict `(0, 1)` with partial progress:
- file path localization
- issue type / bug type correctness
- severity calibration
- line-level precision
- evidence quality in review text
## Advanced Usage
### Connecting to an Existing Server
If you already have a Coding environment server running, you can connect directly:
```python
from coding_env import CodeAction, CodingEnv
# Async usage
async with CodingEnv(base_url="http://localhost:8000") as client:
result = await client.reset()
result = await client.step(CodeAction(code="print('Hello!')"))
# Sync usage
with CodingEnv(base_url="http://localhost:8000").sync() as client:
result = client.reset()
result = client.step(CodeAction(code="print('Hello!')"))
```
Note: When connecting to an existing server, closing the client will NOT stop the server.
## Development & Testing
### Running Tests
Install the coding_env package with dev dependencies and run the tests from the repo root:
```bash
# Install coding_env with dev dependencies (includes smolagents and pytest)
uv pip install -e "envs/coding_env[dev]"
# Run unit tests (no Docker required)
uv run pytest tests/envs/test_python_codeact_reset.py tests/envs/test_python_codeact_rewards.py -v
# Run integration tests (requires Docker image to be built)
docker build -t coding-env:latest -f envs/coding_env/server/Dockerfile .
SKIP_DOCKER_TESTS=0 uv run pytest tests/envs/test_coding_env_integration.py -v
```
### Running the Full Example
Run the complete example that demonstrates the full workflow:
```bash
python3 envs/coding_env/client/example_usage.py
```
This example shows:
- Creating an environment from a Docker image
- Resetting and executing code through the environment
- Automatic cleanup with `close()`
## Project Structure
```
coding_env/
βββ README.md # This file
βββ models.py # Action, Observation, and State models
βββ client/
β βββ coding_env_client.py # CodingEnv client implementation
β βββ example_usage.py # Usage examples
βββ server/
βββ python_codeact_env.py # Core environment logic
βββ app.py # FastAPI application
βββ transforms.py # Observation transforms
βββ Dockerfile # Container image definition
βββ README.md # Server-specific documentation
```
|