Spaces:

ncncomplete
/

code-review-env

Sleeping

App Files Files Community

code-review-env / README.md

ncncomplete

Upload folder using huggingface_hub

a3f3034 verified about 1 month ago

preview code

raw

history blame contribute delete

6.05 kB

	---
	title: Coding Environment Server
	emoji: 💻
	colorFrom: blue
	colorTo: blue
	sdk: docker
	pinned: false
	app_port: 8000
	base_path: /web
	tags:
	- openenv
	---

	# Coding Environment

	A real-world PR triage and code review environment with three graded tasks
	(easy/medium/hard). Each episode presents pull request metadata and a unified
	diff, then asks the agent to submit a structured review.

	## Quick Start

	The simplest way to use the Coding environment is through the `CodingEnv` class. The client is async by default:

	```python
	import asyncio
	from coding_env import CodeAction, CodingEnv

	async def main():
	# Create environment from Docker image
	client = await CodingEnv.from_docker_image("coding-env:latest")

	async with client:
	# Reset
	result = await client.reset()
	print(f"Reset complete: exit_code={result.observation.exit_code}")

	# Execute Python code
	code_samples = [
	"print('Hello, World!')",
	"x = 5 + 3\nprint(f'Result: {x}')",
	"import math\nprint(math.pi)"
	]

	for code in code_samples:
	result = await client.step(CodeAction(code=code))
	print(f"Code: {code}")
	print(f" → stdout: {result.observation.stdout.strip()}")
	print(f" → exit_code: {result.observation.exit_code}")

	asyncio.run(main())
	```

	For synchronous usage, use the `.sync()` wrapper:

	```python
	from coding_env import CodeAction, CodingEnv

	with CodingEnv(base_url="http://localhost:8000").sync() as client:
	result = client.reset()
	result = client.step(CodeAction(code="print('Hello!')"))
	print(result.observation.stdout)
	```

	The `CodingEnv.from_docker_image()` method handles:
	- Starting the Docker container
	- Waiting for the server to be ready
	- Connecting to the environment
	- Container cleanup when the context manager exits

	## Building the Docker Image

	Before using the environment, you need to build the Docker image:

	```bash
	# From project root
	docker build -t coding-env:latest -f envs/coding_env/server/Dockerfile .
	```

	## Environment Details

	### Action
	CodeAction fields:
	- `review` (str) - Human-readable review summary
	- `file_path` (str) - Changed file being flagged
	- `issue_type` (str) - `logic\|security\|performance\|maintainability`
	- `severity` (str) - `low\|medium\|high\|critical`
	- `bug_type` (str) - One of `syntax \| logic \| security \| none`
	- `line_number` (int) - Suspected faulty line
	- `confidence` (float) - Confidence score in `[0.0, 1.0]`

	### Observation
	CodeObservation fields:
	- `task_id` (str) - Current task id
	- `difficulty` (str) - Task difficulty (`easy\|medium\|hard`)
	- `task_description` (str) - Review instructions
	- `code_snippet` (str) - PR context + unified diff
	- `pr_title` (str) - Pull request title
	- `pr_description` (str) - Pull request summary
	- `changed_files` (str) - Changed file list
	- `previous_feedback` (str) - Grader feedback from latest step
	- `reward` (float) - Normalized score contribution `[0.0, 1.0]`
	- `done` (bool) - Episode termination flag

	### State
	CodeState: Tracks execution state
	- `episode_id` (str) - Unique identifier for the episode
	- `step_count` (int) - Number of steps taken
	- `task_id` (str) - Active task id
	- `difficulty` (str) - Active task difficulty
	- `last_score` (float) - Last normalized score

	## Built-in Tasks and Graders

	The server exposes:
	- `GET /tasks` to list all benchmark tasks.
	- `GET /grader?task_id=<id>&episode_id=<id>` to read final normalized score.

	Shipped tasks:
	- `task_easy_1` (logic)
	- `task_medium_1` (security)
	- `task_hard_1` (logic/performance-concurrency)

	Rewards are strict `(0, 1)` with partial progress:
	- file path localization
	- issue type / bug type correctness
	- severity calibration
	- line-level precision
	- evidence quality in review text

	## Advanced Usage

	### Connecting to an Existing Server

	If you already have a Coding environment server running, you can connect directly:

	```python
	from coding_env import CodeAction, CodingEnv

	# Async usage
	async with CodingEnv(base_url="http://localhost:8000") as client:
	result = await client.reset()
	result = await client.step(CodeAction(code="print('Hello!')"))

	# Sync usage
	with CodingEnv(base_url="http://localhost:8000").sync() as client:
	result = client.reset()
	result = client.step(CodeAction(code="print('Hello!')"))
	```

	Note: When connecting to an existing server, closing the client will NOT stop the server.

	## Development & Testing

	### Running Tests

	Install the coding_env package with dev dependencies and run the tests from the repo root:

	```bash
	# Install coding_env with dev dependencies (includes smolagents and pytest)
	uv pip install -e "envs/coding_env[dev]"

	# Run unit tests (no Docker required)
	uv run pytest tests/envs/test_python_codeact_reset.py tests/envs/test_python_codeact_rewards.py -v

	# Run integration tests (requires Docker image to be built)
	docker build -t coding-env:latest -f envs/coding_env/server/Dockerfile .
	SKIP_DOCKER_TESTS=0 uv run pytest tests/envs/test_coding_env_integration.py -v
	```

	### Running the Full Example

	Run the complete example that demonstrates the full workflow:

	```bash
	python3 envs/coding_env/client/example_usage.py
	```

	This example shows:
	- Creating an environment from a Docker image
	- Resetting and executing code through the environment
	- Automatic cleanup with `close()`

	## Project Structure

	```
	coding_env/
	├── README.md # This file
	├── models.py # Action, Observation, and State models
	├── client/
	│ ├── coding_env_client.py # CodingEnv client implementation
	│ └── example_usage.py # Usage examples
	└── server/
	├── python_codeact_env.py # Core environment logic
	├── app.py # FastAPI application
	├── transforms.py # Observation transforms
	├── Dockerfile # Container image definition
	└── README.md # Server-specific documentation
	```