---
title: Coding Environment Server
emoji: 💻
colorFrom: blue
colorTo: blue
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
  - openenv
---

# Coding Environment

A real-world **PR triage and code review** environment with three graded tasks
(easy/medium/hard). Each episode presents pull request metadata and a unified
diff, then asks the agent to submit a structured review.

## Quick Start

The simplest way to use the Coding environment is through the `CodingEnv` class. The client is **async by default**:

```python
import asyncio
from coding_env import CodeAction, CodingEnv

async def main():
    # Create environment from Docker image
    client = await CodingEnv.from_docker_image("coding-env:latest")

    async with client:
        # Reset
        result = await client.reset()
        print(f"Reset complete: exit_code={result.observation.exit_code}")

        # Execute Python code
        code_samples = [
            "print('Hello, World!')",
            "x = 5 + 3\nprint(f'Result: {x}')",
            "import math\nprint(math.pi)"
        ]

        for code in code_samples:
            result = await client.step(CodeAction(code=code))
            print(f"Code: {code}")
            print(f"  → stdout: {result.observation.stdout.strip()}")
            print(f"  → exit_code: {result.observation.exit_code}")

asyncio.run(main())
```

For **synchronous usage**, use the `.sync()` wrapper:

```python
from coding_env import CodeAction, CodingEnv

with CodingEnv(base_url="http://localhost:8000").sync() as client:
    result = client.reset()
    result = client.step(CodeAction(code="print('Hello!')"))
    print(result.observation.stdout)
```

The `CodingEnv.from_docker_image()` method handles:
- Starting the Docker container
- Waiting for the server to be ready
- Connecting to the environment
- Container cleanup when the context manager exits

## Building the Docker Image

Before using the environment, you need to build the Docker image:

```bash
# From project root
docker build -t coding-env:latest -f envs/coding_env/server/Dockerfile .
```

## Environment Details

### Action
**CodeAction** fields:
- `review` (str) - Human-readable review summary
- `file_path` (str) - Changed file being flagged
- `issue_type` (str) - `logic|security|performance|maintainability`
- `severity` (str) - `low|medium|high|critical`
- `bug_type` (str) - One of `syntax | logic | security | none`
- `line_number` (int) - Suspected faulty line
- `confidence` (float) - Confidence score in `[0.0, 1.0]`

### Observation
**CodeObservation** fields:
- `task_id` (str) - Current task id
- `difficulty` (str) - Task difficulty (`easy|medium|hard`)
- `task_description` (str) - Review instructions
- `code_snippet` (str) - PR context + unified diff
- `pr_title` (str) - Pull request title
- `pr_description` (str) - Pull request summary
- `changed_files` (str) - Changed file list
- `previous_feedback` (str) - Grader feedback from latest step
- `reward` (float) - Normalized score contribution `[0.0, 1.0]`
- `done` (bool) - Episode termination flag

### State
**CodeState**: Tracks execution state
- `episode_id` (str) - Unique identifier for the episode
- `step_count` (int) - Number of steps taken
- `task_id` (str) - Active task id
- `difficulty` (str) - Active task difficulty
- `last_score` (float) - Last normalized score

## Built-in Tasks and Graders

The server exposes:
- `GET /tasks` to list all benchmark tasks.
- `GET /grader?task_id=<id>&episode_id=<id>` to read final normalized score.

Shipped tasks:
- `task_easy_1` (logic)
- `task_medium_1` (security)
- `task_hard_1` (logic/performance-concurrency)

Rewards are strict `(0, 1)` with partial progress:
- file path localization
- issue type / bug type correctness
- severity calibration
- line-level precision
- evidence quality in review text

## Advanced Usage

### Connecting to an Existing Server

If you already have a Coding environment server running, you can connect directly:

```python
from coding_env import CodeAction, CodingEnv

# Async usage
async with CodingEnv(base_url="http://localhost:8000") as client:
    result = await client.reset()
    result = await client.step(CodeAction(code="print('Hello!')"))

# Sync usage
with CodingEnv(base_url="http://localhost:8000").sync() as client:
    result = client.reset()
    result = client.step(CodeAction(code="print('Hello!')"))
```

Note: When connecting to an existing server, closing the client will NOT stop the server.

## Development & Testing

### Running Tests

Install the coding_env package with dev dependencies and run the tests from the repo root:

```bash
# Install coding_env with dev dependencies (includes smolagents and pytest)
uv pip install -e "envs/coding_env[dev]"

# Run unit tests (no Docker required)
uv run pytest tests/envs/test_python_codeact_reset.py tests/envs/test_python_codeact_rewards.py -v

# Run integration tests (requires Docker image to be built)
docker build -t coding-env:latest -f envs/coding_env/server/Dockerfile .
SKIP_DOCKER_TESTS=0 uv run pytest tests/envs/test_coding_env_integration.py -v
```

### Running the Full Example

Run the complete example that demonstrates the full workflow:

```bash
python3 envs/coding_env/client/example_usage.py
```

This example shows:
- Creating an environment from a Docker image
- Resetting and executing code through the environment
- Automatic cleanup with `close()`

## Project Structure

```
coding_env/
├── README.md              # This file
├── models.py              # Action, Observation, and State models
├── client/
│   ├── coding_env_client.py  # CodingEnv client implementation
│   └── example_usage.py      # Usage examples
└── server/
    ├── python_codeact_env.py  # Core environment logic
    ├── app.py                 # FastAPI application
    ├── transforms.py          # Observation transforms
    ├── Dockerfile             # Container image definition
    └── README.md              # Server-specific documentation
```