code-debug-env / README.md
luciferai-devil's picture
Upload folder using huggingface_hub
cacd58c verified
---
title: Code Debug Env
emoji: 🐞
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 8000
base_path: /web
---
# code-debug-env
An OpenEnv environment for training AI agents to repair buggy Python code.
The agent receives a broken function and must iteratively submit patches until
all unit tests pass.
## Quick Start
```python
from code_debug_env import CodeDebugEnv, Action
async with CodeDebugEnv(base_url="https://luciferai-devil-code-debug-env.hf.space") as env:
obs = await env.reset(task_id="task_easy")
print(obs.buggy_code) # The broken function
result = await env.step(Action(
patch="def find_max_subarray_sum(nums):\n ...",
task_id="task_easy",
think="The off-by-one error is in range(1, len(nums)-1)"
))
print(result.observation.score) # 0.0–1.0
```
## Action Space
| Field | Type | Required | Description |
|---|---|---|---|
| `patch` | str | Yes | Full Python source replacement for the function |
| `task_id` | str | Yes | Which task to target |
| `think` | str | No | Chain-of-thought reasoning (earns +0.2 reward bonus) |
## Observation Space
| Field | Type | Description |
|---|---|---|
| `buggy_code` | str | Current version of the code |
| `test_results` | list | Per-test pass/fail with error messages |
| `passed` / `total` | int | Tests passing out of total |
| `score` | float | Composite reward for this step (0.0–1.0) |
| `done` | bool | True when all tests pass or max_steps reached |
## Reward Function
```
r = 0.5 Γ— (tests_passed / tests_total) # correctness
+ 0.2 Γ— (1 if valid syntax else 0) # format
+ 0.2 Γ— (1 if <think> provided else 0) # chain-of-thought bonus
+ 0.1 Γ— (steps_remaining / max_steps) # efficiency
βˆ’ 0.3 Γ— (1 if timeout/crash else 0) # penalty
```
## Tasks
| ID | Difficulty | Description | Variants |
|---|---|---|---|
| `task_easy` | Easy | Single off-by-one error | 6+ |
| `task_medium` | Medium | Two independent bugs | 6+ |
| `task_hard` | Hard | 3+ subtle bugs in recursive function | 7+ |
*Total: 19 procedurally generated tasks via `task_generator.py`.*
## Setup
```bash
pip install openenv-core
pip install git+https://huggingface.co/spaces/luciferai-devil/code-debug-env
```
## Docker
```bash
docker pull luciferai-devil/code-debug-env:latest
docker run -p 8000:8000 luciferai-devil/code-debug-env
```
## Baseline Results (via OpenAI API)
Evaluated using `gpt-4o-mini` / `gpt-oss-120b` reasoning models.
| Task | Agent | Score | Notes |
|---|---|---|---|
| task_easy | LLM | 0.99 | One-shot fix with CoT |
| task_medium | LLM | 0.74 | Iterative refinement |
| task_hard | LLM | 0.59 | Struggles with complex recursion depth |
*Average Score: 0.77*
## Training with GRPO
See `baseline/run_baseline.py` for the inference client.
Compatible with TRL's `GRPOTrainer` β€” pass `reward_fn` that calls `/grader`.
## API Endpoints
| Endpoint | Method | Description |
|---|---|---|
| `/health` | GET | Health check |
| `/reset` | POST | Start a new episode |
| `/step` | POST | Submit action, get observation |
| `/state` | GET | Get current episode state |
| `/tasks` | GET | List all available tasks |
| `/grader` | GET | Grade a submission directly |
| `/baseline` | GET | Run baseline agent on all tasks |
## Local Development
```bash
# Run server locally
uvicorn code_debug_env.server.app:app --reload --port 8000
# Build Docker
docker build -t code-debug-env -f server/Dockerfile .
# Run Docker
docker run -p 8000:8000 code-debug-env
# Smoke test
curl http://localhost:8000/health
curl -X POST http://localhost:8000/reset -H "Content-Type: application/json" -d '{}'
curl http://localhost:8000/tasks
```