Spaces:
Sleeping
Sleeping
| title: Coding Environment Server | |
| emoji: π» | |
| colorFrom: blue | |
| colorTo: blue | |
| sdk: docker | |
| pinned: false | |
| app_port: 8000 | |
| base_path: /web | |
| tags: | |
| - openenv | |
| # Coding Environment | |
| A real-world **PR triage and code review** environment with three graded tasks | |
| (easy/medium/hard). Each episode presents pull request metadata and a unified | |
| diff, then asks the agent to submit a structured review. | |
| ## Quick Start | |
| The simplest way to use the Coding environment is through the `CodingEnv` class. The client is **async by default**: | |
| ```python | |
| import asyncio | |
| from coding_env import CodeAction, CodingEnv | |
| async def main(): | |
| # Create environment from Docker image | |
| client = await CodingEnv.from_docker_image("coding-env:latest") | |
| async with client: | |
| # Reset | |
| result = await client.reset() | |
| print(f"Reset complete: exit_code={result.observation.exit_code}") | |
| # Execute Python code | |
| code_samples = [ | |
| "print('Hello, World!')", | |
| "x = 5 + 3\nprint(f'Result: {x}')", | |
| "import math\nprint(math.pi)" | |
| ] | |
| for code in code_samples: | |
| result = await client.step(CodeAction(code=code)) | |
| print(f"Code: {code}") | |
| print(f" β stdout: {result.observation.stdout.strip()}") | |
| print(f" β exit_code: {result.observation.exit_code}") | |
| asyncio.run(main()) | |
| ``` | |
| For **synchronous usage**, use the `.sync()` wrapper: | |
| ```python | |
| from coding_env import CodeAction, CodingEnv | |
| with CodingEnv(base_url="http://localhost:8000").sync() as client: | |
| result = client.reset() | |
| result = client.step(CodeAction(code="print('Hello!')")) | |
| print(result.observation.stdout) | |
| ``` | |
| The `CodingEnv.from_docker_image()` method handles: | |
| - Starting the Docker container | |
| - Waiting for the server to be ready | |
| - Connecting to the environment | |
| - Container cleanup when the context manager exits | |
| ## Building the Docker Image | |
| Before using the environment, you need to build the Docker image: | |
| ```bash | |
| # From project root | |
| docker build -t coding-env:latest -f envs/coding_env/server/Dockerfile . | |
| ``` | |
| ## Environment Details | |
| ### Action | |
| **CodeAction** fields: | |
| - `review` (str) - Human-readable review summary | |
| - `file_path` (str) - Changed file being flagged | |
| - `issue_type` (str) - `logic|security|performance|maintainability` | |
| - `severity` (str) - `low|medium|high|critical` | |
| - `bug_type` (str) - One of `syntax | logic | security | none` | |
| - `line_number` (int) - Suspected faulty line | |
| - `confidence` (float) - Confidence score in `[0.0, 1.0]` | |
| ### Observation | |
| **CodeObservation** fields: | |
| - `task_id` (str) - Current task id | |
| - `difficulty` (str) - Task difficulty (`easy|medium|hard`) | |
| - `task_description` (str) - Review instructions | |
| - `code_snippet` (str) - PR context + unified diff | |
| - `pr_title` (str) - Pull request title | |
| - `pr_description` (str) - Pull request summary | |
| - `changed_files` (str) - Changed file list | |
| - `previous_feedback` (str) - Grader feedback from latest step | |
| - `reward` (float) - Normalized score contribution `[0.0, 1.0]` | |
| - `done` (bool) - Episode termination flag | |
| ### State | |
| **CodeState**: Tracks execution state | |
| - `episode_id` (str) - Unique identifier for the episode | |
| - `step_count` (int) - Number of steps taken | |
| - `task_id` (str) - Active task id | |
| - `difficulty` (str) - Active task difficulty | |
| - `last_score` (float) - Last normalized score | |
| ## Built-in Tasks and Graders | |
| The server exposes: | |
| - `GET /tasks` to list all benchmark tasks. | |
| - `GET /grader?task_id=<id>&episode_id=<id>` to read final normalized score. | |
| Shipped tasks: | |
| - `task_easy_1` (logic) | |
| - `task_medium_1` (security) | |
| - `task_hard_1` (logic/performance-concurrency) | |
| Rewards are strict `(0, 1)` with partial progress: | |
| - file path localization | |
| - issue type / bug type correctness | |
| - severity calibration | |
| - line-level precision | |
| - evidence quality in review text | |
| ## Advanced Usage | |
| ### Connecting to an Existing Server | |
| If you already have a Coding environment server running, you can connect directly: | |
| ```python | |
| from coding_env import CodeAction, CodingEnv | |
| # Async usage | |
| async with CodingEnv(base_url="http://localhost:8000") as client: | |
| result = await client.reset() | |
| result = await client.step(CodeAction(code="print('Hello!')")) | |
| # Sync usage | |
| with CodingEnv(base_url="http://localhost:8000").sync() as client: | |
| result = client.reset() | |
| result = client.step(CodeAction(code="print('Hello!')")) | |
| ``` | |
| Note: When connecting to an existing server, closing the client will NOT stop the server. | |
| ## Development & Testing | |
| ### Running Tests | |
| Install the coding_env package with dev dependencies and run the tests from the repo root: | |
| ```bash | |
| # Install coding_env with dev dependencies (includes smolagents and pytest) | |
| uv pip install -e "envs/coding_env[dev]" | |
| # Run unit tests (no Docker required) | |
| uv run pytest tests/envs/test_python_codeact_reset.py tests/envs/test_python_codeact_rewards.py -v | |
| # Run integration tests (requires Docker image to be built) | |
| docker build -t coding-env:latest -f envs/coding_env/server/Dockerfile . | |
| SKIP_DOCKER_TESTS=0 uv run pytest tests/envs/test_coding_env_integration.py -v | |
| ``` | |
| ### Running the Full Example | |
| Run the complete example that demonstrates the full workflow: | |
| ```bash | |
| python3 envs/coding_env/client/example_usage.py | |
| ``` | |
| This example shows: | |
| - Creating an environment from a Docker image | |
| - Resetting and executing code through the environment | |
| - Automatic cleanup with `close()` | |
| ## Project Structure | |
| ``` | |
| coding_env/ | |
| βββ README.md # This file | |
| βββ models.py # Action, Observation, and State models | |
| βββ client/ | |
| β βββ coding_env_client.py # CodingEnv client implementation | |
| β βββ example_usage.py # Usage examples | |
| βββ server/ | |
| βββ python_codeact_env.py # Core environment logic | |
| βββ app.py # FastAPI application | |
| βββ transforms.py # Observation transforms | |
| βββ Dockerfile # Container image definition | |
| βββ README.md # Server-specific documentation | |
| ``` | |