--- title: Coding Environment Server emoji: 💻 colorFrom: blue colorTo: blue sdk: docker pinned: false app_port: 8000 base_path: /web tags: - openenv --- # Coding Environment A real-world **PR triage and code review** environment with three graded tasks (easy/medium/hard). Each episode presents pull request metadata and a unified diff, then asks the agent to submit a structured review. ## Quick Start The simplest way to use the Coding environment is through the `CodingEnv` class. The client is **async by default**: ```python import asyncio from coding_env import CodeAction, CodingEnv async def main(): # Create environment from Docker image client = await CodingEnv.from_docker_image("coding-env:latest") async with client: # Reset result = await client.reset() print(f"Reset complete: exit_code={result.observation.exit_code}") # Execute Python code code_samples = [ "print('Hello, World!')", "x = 5 + 3\nprint(f'Result: {x}')", "import math\nprint(math.pi)" ] for code in code_samples: result = await client.step(CodeAction(code=code)) print(f"Code: {code}") print(f" → stdout: {result.observation.stdout.strip()}") print(f" → exit_code: {result.observation.exit_code}") asyncio.run(main()) ``` For **synchronous usage**, use the `.sync()` wrapper: ```python from coding_env import CodeAction, CodingEnv with CodingEnv(base_url="http://localhost:8000").sync() as client: result = client.reset() result = client.step(CodeAction(code="print('Hello!')")) print(result.observation.stdout) ``` The `CodingEnv.from_docker_image()` method handles: - Starting the Docker container - Waiting for the server to be ready - Connecting to the environment - Container cleanup when the context manager exits ## Building the Docker Image Before using the environment, you need to build the Docker image: ```bash # From project root docker build -t coding-env:latest -f envs/coding_env/server/Dockerfile . ``` ## Environment Details ### Action **CodeAction** fields: - `review` (str) - Human-readable review summary - `file_path` (str) - Changed file being flagged - `issue_type` (str) - `logic|security|performance|maintainability` - `severity` (str) - `low|medium|high|critical` - `bug_type` (str) - One of `syntax | logic | security | none` - `line_number` (int) - Suspected faulty line - `confidence` (float) - Confidence score in `[0.0, 1.0]` ### Observation **CodeObservation** fields: - `task_id` (str) - Current task id - `difficulty` (str) - Task difficulty (`easy|medium|hard`) - `task_description` (str) - Review instructions - `code_snippet` (str) - PR context + unified diff - `pr_title` (str) - Pull request title - `pr_description` (str) - Pull request summary - `changed_files` (str) - Changed file list - `previous_feedback` (str) - Grader feedback from latest step - `reward` (float) - Normalized score contribution `[0.0, 1.0]` - `done` (bool) - Episode termination flag ### State **CodeState**: Tracks execution state - `episode_id` (str) - Unique identifier for the episode - `step_count` (int) - Number of steps taken - `task_id` (str) - Active task id - `difficulty` (str) - Active task difficulty - `last_score` (float) - Last normalized score ## Built-in Tasks and Graders The server exposes: - `GET /tasks` to list all benchmark tasks. - `GET /grader?task_id=&episode_id=` to read final normalized score. Shipped tasks: - `task_easy_1` (logic) - `task_medium_1` (security) - `task_hard_1` (logic/performance-concurrency) Rewards are strict `(0, 1)` with partial progress: - file path localization - issue type / bug type correctness - severity calibration - line-level precision - evidence quality in review text ## Advanced Usage ### Connecting to an Existing Server If you already have a Coding environment server running, you can connect directly: ```python from coding_env import CodeAction, CodingEnv # Async usage async with CodingEnv(base_url="http://localhost:8000") as client: result = await client.reset() result = await client.step(CodeAction(code="print('Hello!')")) # Sync usage with CodingEnv(base_url="http://localhost:8000").sync() as client: result = client.reset() result = client.step(CodeAction(code="print('Hello!')")) ``` Note: When connecting to an existing server, closing the client will NOT stop the server. ## Development & Testing ### Running Tests Install the coding_env package with dev dependencies and run the tests from the repo root: ```bash # Install coding_env with dev dependencies (includes smolagents and pytest) uv pip install -e "envs/coding_env[dev]" # Run unit tests (no Docker required) uv run pytest tests/envs/test_python_codeact_reset.py tests/envs/test_python_codeact_rewards.py -v # Run integration tests (requires Docker image to be built) docker build -t coding-env:latest -f envs/coding_env/server/Dockerfile . SKIP_DOCKER_TESTS=0 uv run pytest tests/envs/test_coding_env_integration.py -v ``` ### Running the Full Example Run the complete example that demonstrates the full workflow: ```bash python3 envs/coding_env/client/example_usage.py ``` This example shows: - Creating an environment from a Docker image - Resetting and executing code through the environment - Automatic cleanup with `close()` ## Project Structure ``` coding_env/ ├── README.md # This file ├── models.py # Action, Observation, and State models ├── client/ │ ├── coding_env_client.py # CodingEnv client implementation │ └── example_usage.py # Usage examples └── server/ ├── python_codeact_env.py # Core environment logic ├── app.py # FastAPI application ├── transforms.py # Observation transforms ├── Dockerfile # Container image definition └── README.md # Server-specific documentation ```