code-review-env / README.md
ncncomplete's picture
Upload folder using huggingface_hub
a3f3034 verified
metadata
title: Coding Environment Server
emoji: πŸ’»
colorFrom: blue
colorTo: blue
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
  - openenv

Coding Environment

A real-world PR triage and code review environment with three graded tasks (easy/medium/hard). Each episode presents pull request metadata and a unified diff, then asks the agent to submit a structured review.

Quick Start

The simplest way to use the Coding environment is through the CodingEnv class. The client is async by default:

import asyncio
from coding_env import CodeAction, CodingEnv

async def main():
    # Create environment from Docker image
    client = await CodingEnv.from_docker_image("coding-env:latest")

    async with client:
        # Reset
        result = await client.reset()
        print(f"Reset complete: exit_code={result.observation.exit_code}")

        # Execute Python code
        code_samples = [
            "print('Hello, World!')",
            "x = 5 + 3\nprint(f'Result: {x}')",
            "import math\nprint(math.pi)"
        ]

        for code in code_samples:
            result = await client.step(CodeAction(code=code))
            print(f"Code: {code}")
            print(f"  β†’ stdout: {result.observation.stdout.strip()}")
            print(f"  β†’ exit_code: {result.observation.exit_code}")

asyncio.run(main())

For synchronous usage, use the .sync() wrapper:

from coding_env import CodeAction, CodingEnv

with CodingEnv(base_url="http://localhost:8000").sync() as client:
    result = client.reset()
    result = client.step(CodeAction(code="print('Hello!')"))
    print(result.observation.stdout)

The CodingEnv.from_docker_image() method handles:

  • Starting the Docker container
  • Waiting for the server to be ready
  • Connecting to the environment
  • Container cleanup when the context manager exits

Building the Docker Image

Before using the environment, you need to build the Docker image:

# From project root
docker build -t coding-env:latest -f envs/coding_env/server/Dockerfile .

Environment Details

Action

CodeAction fields:

  • review (str) - Human-readable review summary
  • file_path (str) - Changed file being flagged
  • issue_type (str) - logic|security|performance|maintainability
  • severity (str) - low|medium|high|critical
  • bug_type (str) - One of syntax | logic | security | none
  • line_number (int) - Suspected faulty line
  • confidence (float) - Confidence score in [0.0, 1.0]

Observation

CodeObservation fields:

  • task_id (str) - Current task id
  • difficulty (str) - Task difficulty (easy|medium|hard)
  • task_description (str) - Review instructions
  • code_snippet (str) - PR context + unified diff
  • pr_title (str) - Pull request title
  • pr_description (str) - Pull request summary
  • changed_files (str) - Changed file list
  • previous_feedback (str) - Grader feedback from latest step
  • reward (float) - Normalized score contribution [0.0, 1.0]
  • done (bool) - Episode termination flag

State

CodeState: Tracks execution state

  • episode_id (str) - Unique identifier for the episode
  • step_count (int) - Number of steps taken
  • task_id (str) - Active task id
  • difficulty (str) - Active task difficulty
  • last_score (float) - Last normalized score

Built-in Tasks and Graders

The server exposes:

  • GET /tasks to list all benchmark tasks.
  • GET /grader?task_id=<id>&episode_id=<id> to read final normalized score.

Shipped tasks:

  • task_easy_1 (logic)
  • task_medium_1 (security)
  • task_hard_1 (logic/performance-concurrency)

Rewards are strict (0, 1) with partial progress:

  • file path localization
  • issue type / bug type correctness
  • severity calibration
  • line-level precision
  • evidence quality in review text

Advanced Usage

Connecting to an Existing Server

If you already have a Coding environment server running, you can connect directly:

from coding_env import CodeAction, CodingEnv

# Async usage
async with CodingEnv(base_url="http://localhost:8000") as client:
    result = await client.reset()
    result = await client.step(CodeAction(code="print('Hello!')"))

# Sync usage
with CodingEnv(base_url="http://localhost:8000").sync() as client:
    result = client.reset()
    result = client.step(CodeAction(code="print('Hello!')"))

Note: When connecting to an existing server, closing the client will NOT stop the server.

Development & Testing

Running Tests

Install the coding_env package with dev dependencies and run the tests from the repo root:

# Install coding_env with dev dependencies (includes smolagents and pytest)
uv pip install -e "envs/coding_env[dev]"

# Run unit tests (no Docker required)
uv run pytest tests/envs/test_python_codeact_reset.py tests/envs/test_python_codeact_rewards.py -v

# Run integration tests (requires Docker image to be built)
docker build -t coding-env:latest -f envs/coding_env/server/Dockerfile .
SKIP_DOCKER_TESTS=0 uv run pytest tests/envs/test_coding_env_integration.py -v

Running the Full Example

Run the complete example that demonstrates the full workflow:

python3 envs/coding_env/client/example_usage.py

This example shows:

  • Creating an environment from a Docker image
  • Resetting and executing code through the environment
  • Automatic cleanup with close()

Project Structure

coding_env/
β”œβ”€β”€ README.md              # This file
β”œβ”€β”€ models.py              # Action, Observation, and State models
β”œβ”€β”€ client/
β”‚   β”œβ”€β”€ coding_env_client.py  # CodingEnv client implementation
β”‚   └── example_usage.py      # Usage examples
└── server/
    β”œβ”€β”€ python_codeact_env.py  # Core environment logic
    β”œβ”€β”€ app.py                 # FastAPI application
    β”œβ”€β”€ transforms.py          # Observation transforms
    β”œβ”€β”€ Dockerfile             # Container image definition
    └── README.md              # Server-specific documentation