code-review-env / README.md
ncncomplete's picture
Upload folder using huggingface_hub
a3f3034 verified
---
title: Coding Environment Server
emoji: πŸ’»
colorFrom: blue
colorTo: blue
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
- openenv
---
# Coding Environment
A real-world **PR triage and code review** environment with three graded tasks
(easy/medium/hard). Each episode presents pull request metadata and a unified
diff, then asks the agent to submit a structured review.
## Quick Start
The simplest way to use the Coding environment is through the `CodingEnv` class. The client is **async by default**:
```python
import asyncio
from coding_env import CodeAction, CodingEnv
async def main():
# Create environment from Docker image
client = await CodingEnv.from_docker_image("coding-env:latest")
async with client:
# Reset
result = await client.reset()
print(f"Reset complete: exit_code={result.observation.exit_code}")
# Execute Python code
code_samples = [
"print('Hello, World!')",
"x = 5 + 3\nprint(f'Result: {x}')",
"import math\nprint(math.pi)"
]
for code in code_samples:
result = await client.step(CodeAction(code=code))
print(f"Code: {code}")
print(f" β†’ stdout: {result.observation.stdout.strip()}")
print(f" β†’ exit_code: {result.observation.exit_code}")
asyncio.run(main())
```
For **synchronous usage**, use the `.sync()` wrapper:
```python
from coding_env import CodeAction, CodingEnv
with CodingEnv(base_url="http://localhost:8000").sync() as client:
result = client.reset()
result = client.step(CodeAction(code="print('Hello!')"))
print(result.observation.stdout)
```
The `CodingEnv.from_docker_image()` method handles:
- Starting the Docker container
- Waiting for the server to be ready
- Connecting to the environment
- Container cleanup when the context manager exits
## Building the Docker Image
Before using the environment, you need to build the Docker image:
```bash
# From project root
docker build -t coding-env:latest -f envs/coding_env/server/Dockerfile .
```
## Environment Details
### Action
**CodeAction** fields:
- `review` (str) - Human-readable review summary
- `file_path` (str) - Changed file being flagged
- `issue_type` (str) - `logic|security|performance|maintainability`
- `severity` (str) - `low|medium|high|critical`
- `bug_type` (str) - One of `syntax | logic | security | none`
- `line_number` (int) - Suspected faulty line
- `confidence` (float) - Confidence score in `[0.0, 1.0]`
### Observation
**CodeObservation** fields:
- `task_id` (str) - Current task id
- `difficulty` (str) - Task difficulty (`easy|medium|hard`)
- `task_description` (str) - Review instructions
- `code_snippet` (str) - PR context + unified diff
- `pr_title` (str) - Pull request title
- `pr_description` (str) - Pull request summary
- `changed_files` (str) - Changed file list
- `previous_feedback` (str) - Grader feedback from latest step
- `reward` (float) - Normalized score contribution `[0.0, 1.0]`
- `done` (bool) - Episode termination flag
### State
**CodeState**: Tracks execution state
- `episode_id` (str) - Unique identifier for the episode
- `step_count` (int) - Number of steps taken
- `task_id` (str) - Active task id
- `difficulty` (str) - Active task difficulty
- `last_score` (float) - Last normalized score
## Built-in Tasks and Graders
The server exposes:
- `GET /tasks` to list all benchmark tasks.
- `GET /grader?task_id=<id>&episode_id=<id>` to read final normalized score.
Shipped tasks:
- `task_easy_1` (logic)
- `task_medium_1` (security)
- `task_hard_1` (logic/performance-concurrency)
Rewards are strict `(0, 1)` with partial progress:
- file path localization
- issue type / bug type correctness
- severity calibration
- line-level precision
- evidence quality in review text
## Advanced Usage
### Connecting to an Existing Server
If you already have a Coding environment server running, you can connect directly:
```python
from coding_env import CodeAction, CodingEnv
# Async usage
async with CodingEnv(base_url="http://localhost:8000") as client:
result = await client.reset()
result = await client.step(CodeAction(code="print('Hello!')"))
# Sync usage
with CodingEnv(base_url="http://localhost:8000").sync() as client:
result = client.reset()
result = client.step(CodeAction(code="print('Hello!')"))
```
Note: When connecting to an existing server, closing the client will NOT stop the server.
## Development & Testing
### Running Tests
Install the coding_env package with dev dependencies and run the tests from the repo root:
```bash
# Install coding_env with dev dependencies (includes smolagents and pytest)
uv pip install -e "envs/coding_env[dev]"
# Run unit tests (no Docker required)
uv run pytest tests/envs/test_python_codeact_reset.py tests/envs/test_python_codeact_rewards.py -v
# Run integration tests (requires Docker image to be built)
docker build -t coding-env:latest -f envs/coding_env/server/Dockerfile .
SKIP_DOCKER_TESTS=0 uv run pytest tests/envs/test_coding_env_integration.py -v
```
### Running the Full Example
Run the complete example that demonstrates the full workflow:
```bash
python3 envs/coding_env/client/example_usage.py
```
This example shows:
- Creating an environment from a Docker image
- Resetting and executing code through the environment
- Automatic cleanup with `close()`
## Project Structure
```
coding_env/
β”œβ”€β”€ README.md # This file
β”œβ”€β”€ models.py # Action, Observation, and State models
β”œβ”€β”€ client/
β”‚ β”œβ”€β”€ coding_env_client.py # CodingEnv client implementation
β”‚ └── example_usage.py # Usage examples
└── server/
β”œβ”€β”€ python_codeact_env.py # Core environment logic
β”œβ”€β”€ app.py # FastAPI application
β”œβ”€β”€ transforms.py # Observation transforms
β”œβ”€β”€ Dockerfile # Container image definition
└── README.md # Server-specific documentation
```