File size: 6,053 Bytes
85ff496
4ded5ed
 
 
 
85ff496
 
6b42632
4ded5ed
6b42632
 
85ff496
 
4ded5ed
 
a3f3034
 
 
4ded5ed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d145b94
 
a3f3034
 
 
d145b94
 
 
6b42632
4ded5ed
d145b94
 
 
 
a3f3034
 
 
 
d145b94
 
 
6b42632
4ded5ed
 
 
 
d145b94
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a3f3034
 
 
 
 
 
6b42632
4ded5ed
6b42632
4ded5ed
6b42632
4ded5ed
6b42632
4ded5ed
 
6b42632
4ded5ed
 
 
 
6b42632
4ded5ed
 
 
 
 
 
 
6b42632
4ded5ed
6b42632
4ded5ed
6b42632
4ded5ed
 
 
 
 
6b42632
4ded5ed
 
6b42632
4ded5ed
 
 
 
6b42632
4ded5ed
6b42632
4ded5ed
6b42632
 
4ded5ed
6b42632
 
4ded5ed
 
 
 
 
 
6b42632
4ded5ed
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
---
title: Coding Environment Server
emoji: πŸ’»
colorFrom: blue
colorTo: blue
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
  - openenv
---

# Coding Environment

A real-world **PR triage and code review** environment with three graded tasks
(easy/medium/hard). Each episode presents pull request metadata and a unified
diff, then asks the agent to submit a structured review.

## Quick Start

The simplest way to use the Coding environment is through the `CodingEnv` class. The client is **async by default**:

```python
import asyncio
from coding_env import CodeAction, CodingEnv

async def main():
    # Create environment from Docker image
    client = await CodingEnv.from_docker_image("coding-env:latest")

    async with client:
        # Reset
        result = await client.reset()
        print(f"Reset complete: exit_code={result.observation.exit_code}")

        # Execute Python code
        code_samples = [
            "print('Hello, World!')",
            "x = 5 + 3\nprint(f'Result: {x}')",
            "import math\nprint(math.pi)"
        ]

        for code in code_samples:
            result = await client.step(CodeAction(code=code))
            print(f"Code: {code}")
            print(f"  β†’ stdout: {result.observation.stdout.strip()}")
            print(f"  β†’ exit_code: {result.observation.exit_code}")

asyncio.run(main())
```

For **synchronous usage**, use the `.sync()` wrapper:

```python
from coding_env import CodeAction, CodingEnv

with CodingEnv(base_url="http://localhost:8000").sync() as client:
    result = client.reset()
    result = client.step(CodeAction(code="print('Hello!')"))
    print(result.observation.stdout)
```

The `CodingEnv.from_docker_image()` method handles:
- Starting the Docker container
- Waiting for the server to be ready
- Connecting to the environment
- Container cleanup when the context manager exits

## Building the Docker Image

Before using the environment, you need to build the Docker image:

```bash
# From project root
docker build -t coding-env:latest -f envs/coding_env/server/Dockerfile .
```

## Environment Details

### Action
**CodeAction** fields:
- `review` (str) - Human-readable review summary
- `file_path` (str) - Changed file being flagged
- `issue_type` (str) - `logic|security|performance|maintainability`
- `severity` (str) - `low|medium|high|critical`
- `bug_type` (str) - One of `syntax | logic | security | none`
- `line_number` (int) - Suspected faulty line
- `confidence` (float) - Confidence score in `[0.0, 1.0]`

### Observation
**CodeObservation** fields:
- `task_id` (str) - Current task id
- `difficulty` (str) - Task difficulty (`easy|medium|hard`)
- `task_description` (str) - Review instructions
- `code_snippet` (str) - PR context + unified diff
- `pr_title` (str) - Pull request title
- `pr_description` (str) - Pull request summary
- `changed_files` (str) - Changed file list
- `previous_feedback` (str) - Grader feedback from latest step
- `reward` (float) - Normalized score contribution `[0.0, 1.0]`
- `done` (bool) - Episode termination flag

### State
**CodeState**: Tracks execution state
- `episode_id` (str) - Unique identifier for the episode
- `step_count` (int) - Number of steps taken
- `task_id` (str) - Active task id
- `difficulty` (str) - Active task difficulty
- `last_score` (float) - Last normalized score

## Built-in Tasks and Graders

The server exposes:
- `GET /tasks` to list all benchmark tasks.
- `GET /grader?task_id=<id>&episode_id=<id>` to read final normalized score.

Shipped tasks:
- `task_easy_1` (logic)
- `task_medium_1` (security)
- `task_hard_1` (logic/performance-concurrency)

Rewards are strict `(0, 1)` with partial progress:
- file path localization
- issue type / bug type correctness
- severity calibration
- line-level precision
- evidence quality in review text

## Advanced Usage

### Connecting to an Existing Server

If you already have a Coding environment server running, you can connect directly:

```python
from coding_env import CodeAction, CodingEnv

# Async usage
async with CodingEnv(base_url="http://localhost:8000") as client:
    result = await client.reset()
    result = await client.step(CodeAction(code="print('Hello!')"))

# Sync usage
with CodingEnv(base_url="http://localhost:8000").sync() as client:
    result = client.reset()
    result = client.step(CodeAction(code="print('Hello!')"))
```

Note: When connecting to an existing server, closing the client will NOT stop the server.

## Development & Testing

### Running Tests

Install the coding_env package with dev dependencies and run the tests from the repo root:

```bash
# Install coding_env with dev dependencies (includes smolagents and pytest)
uv pip install -e "envs/coding_env[dev]"

# Run unit tests (no Docker required)
uv run pytest tests/envs/test_python_codeact_reset.py tests/envs/test_python_codeact_rewards.py -v

# Run integration tests (requires Docker image to be built)
docker build -t coding-env:latest -f envs/coding_env/server/Dockerfile .
SKIP_DOCKER_TESTS=0 uv run pytest tests/envs/test_coding_env_integration.py -v
```

### Running the Full Example

Run the complete example that demonstrates the full workflow:

```bash
python3 envs/coding_env/client/example_usage.py
```

This example shows:
- Creating an environment from a Docker image
- Resetting and executing code through the environment
- Automatic cleanup with `close()`

## Project Structure

```
coding_env/
β”œβ”€β”€ README.md              # This file
β”œβ”€β”€ models.py              # Action, Observation, and State models
β”œβ”€β”€ client/
β”‚   β”œβ”€β”€ coding_env_client.py  # CodingEnv client implementation
β”‚   └── example_usage.py      # Usage examples
└── server/
    β”œβ”€β”€ python_codeact_env.py  # Core environment logic
    β”œβ”€β”€ app.py                 # FastAPI application
    β”œβ”€β”€ transforms.py          # Observation transforms
    β”œβ”€β”€ Dockerfile             # Container image definition
    └── README.md              # Server-specific documentation
```