Spaces:

ncncomplete
/

code-review-env

Sleeping

App Files Files Community

code-review-env / README.md

ncncomplete

Upload folder using huggingface_hub

a3f3034 verified about 1 month ago

preview code

raw

history blame contribute delete

6.05 kB

metadata

title: Coding Environment Server
emoji: 💻
colorFrom: blue
colorTo: blue
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
  - openenv

Coding Environment

A real-world PR triage and code review environment with three graded tasks (easy/medium/hard). Each episode presents pull request metadata and a unified diff, then asks the agent to submit a structured review.

Quick Start

The simplest way to use the Coding environment is through the CodingEnv class. The client is async by default:

import asyncio
from coding_env import CodeAction, CodingEnv

async def main():
    # Create environment from Docker image
    client = await CodingEnv.from_docker_image("coding-env:latest")

    async with client:
        # Reset
        result = await client.reset()
        print(f"Reset complete: exit_code={result.observation.exit_code}")

        # Execute Python code
        code_samples = [
            "print('Hello, World!')",
            "x = 5 + 3\nprint(f'Result: {x}')",
            "import math\nprint(math.pi)"
        ]

        for code in code_samples:
            result = await client.step(CodeAction(code=code))
            print(f"Code: {code}")
            print(f"  → stdout: {result.observation.stdout.strip()}")
            print(f"  → exit_code: {result.observation.exit_code}")

asyncio.run(main())

For synchronous usage, use the .sync() wrapper:

from coding_env import CodeAction, CodingEnv

with CodingEnv(base_url="http://localhost:8000").sync() as client:
    result = client.reset()
    result = client.step(CodeAction(code="print('Hello!')"))
    print(result.observation.stdout)

The CodingEnv.from_docker_image() method handles:

Starting the Docker container
Waiting for the server to be ready
Connecting to the environment
Container cleanup when the context manager exits

Building the Docker Image

Before using the environment, you need to build the Docker image:

# From project root
docker build -t coding-env:latest -f envs/coding_env/server/Dockerfile .

Environment Details

Action

CodeAction fields:

review (str) - Human-readable review summary
file_path (str) - Changed file being flagged
issue_type (str) - logic|security|performance|maintainability
severity (str) - low|medium|high|critical
bug_type (str) - One of syntax | logic | security | none
line_number (int) - Suspected faulty line
confidence (float) - Confidence score in [0.0, 1.0]

Observation

CodeObservation fields:

task_id (str) - Current task id
difficulty (str) - Task difficulty (easy|medium|hard)
task_description (str) - Review instructions
code_snippet (str) - PR context + unified diff
pr_title (str) - Pull request title
pr_description (str) - Pull request summary
changed_files (str) - Changed file list
previous_feedback (str) - Grader feedback from latest step
reward (float) - Normalized score contribution [0.0, 1.0]
done (bool) - Episode termination flag

State

CodeState: Tracks execution state

episode_id (str) - Unique identifier for the episode
step_count (int) - Number of steps taken
task_id (str) - Active task id
difficulty (str) - Active task difficulty
last_score (float) - Last normalized score

Built-in Tasks and Graders

The server exposes:

GET /tasks to list all benchmark tasks.
GET /grader?task_id=<id>&episode_id=<id> to read final normalized score.

Shipped tasks:

task_easy_1 (logic)
task_medium_1 (security)
task_hard_1 (logic/performance-concurrency)

Rewards are strict (0, 1) with partial progress:

file path localization
issue type / bug type correctness
severity calibration
line-level precision
evidence quality in review text

Advanced Usage

Connecting to an Existing Server

If you already have a Coding environment server running, you can connect directly:

from coding_env import CodeAction, CodingEnv

# Async usage
async with CodingEnv(base_url="http://localhost:8000") as client:
    result = await client.reset()
    result = await client.step(CodeAction(code="print('Hello!')"))

# Sync usage
with CodingEnv(base_url="http://localhost:8000").sync() as client:
    result = client.reset()
    result = client.step(CodeAction(code="print('Hello!')"))

Note: When connecting to an existing server, closing the client will NOT stop the server.

Development & Testing

Running Tests

Install the coding_env package with dev dependencies and run the tests from the repo root:

# Install coding_env with dev dependencies (includes smolagents and pytest)
uv pip install -e "envs/coding_env[dev]"

# Run unit tests (no Docker required)
uv run pytest tests/envs/test_python_codeact_reset.py tests/envs/test_python_codeact_rewards.py -v

# Run integration tests (requires Docker image to be built)
docker build -t coding-env:latest -f envs/coding_env/server/Dockerfile .
SKIP_DOCKER_TESTS=0 uv run pytest tests/envs/test_coding_env_integration.py -v

Running the Full Example

Run the complete example that demonstrates the full workflow:

python3 envs/coding_env/client/example_usage.py

This example shows:

Creating an environment from a Docker image
Resetting and executing code through the environment
Automatic cleanup with close()

Project Structure

coding_env/
├── README.md              # This file
├── models.py              # Action, Observation, and State models
├── client/
│   ├── coding_env_client.py  # CodingEnv client implementation
│   └── example_usage.py      # Usage examples
└── server/
    ├── python_codeact_env.py  # Core environment logic
    ├── app.py                 # FastAPI application
    ├── transforms.py          # Observation transforms
    ├── Dockerfile             # Container image definition
    └── README.md              # Server-specific documentation