Spaces:
Sleeping
Sleeping
| title: Code Debug Env | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: indigo | |
| sdk: docker | |
| app_port: 8000 | |
| base_path: /web | |
| # code-debug-env | |
| An OpenEnv environment for training AI agents to repair buggy Python code. | |
| The agent receives a broken function and must iteratively submit patches until | |
| all unit tests pass. | |
| ## Quick Start | |
| ```python | |
| from code_debug_env import CodeDebugEnv, Action | |
| async with CodeDebugEnv(base_url="https://luciferai-devil-code-debug-env.hf.space") as env: | |
| obs = await env.reset(task_id="task_easy") | |
| print(obs.buggy_code) # The broken function | |
| result = await env.step(Action( | |
| patch="def find_max_subarray_sum(nums):\n ...", | |
| task_id="task_easy", | |
| think="The off-by-one error is in range(1, len(nums)-1)" | |
| )) | |
| print(result.observation.score) # 0.0β1.0 | |
| ``` | |
| ## Action Space | |
| | Field | Type | Required | Description | | |
| |---|---|---|---| | |
| | `patch` | str | Yes | Full Python source replacement for the function | | |
| | `task_id` | str | Yes | Which task to target | | |
| | `think` | str | No | Chain-of-thought reasoning (earns +0.2 reward bonus) | | |
| ## Observation Space | |
| | Field | Type | Description | | |
| |---|---|---| | |
| | `buggy_code` | str | Current version of the code | | |
| | `test_results` | list | Per-test pass/fail with error messages | | |
| | `passed` / `total` | int | Tests passing out of total | | |
| | `score` | float | Composite reward for this step (0.0β1.0) | | |
| | `done` | bool | True when all tests pass or max_steps reached | | |
| ## Reward Function | |
| ``` | |
| r = 0.5 Γ (tests_passed / tests_total) # correctness | |
| + 0.2 Γ (1 if valid syntax else 0) # format | |
| + 0.2 Γ (1 if <think> provided else 0) # chain-of-thought bonus | |
| + 0.1 Γ (steps_remaining / max_steps) # efficiency | |
| β 0.3 Γ (1 if timeout/crash else 0) # penalty | |
| ``` | |
| ## Tasks | |
| | ID | Difficulty | Description | Variants | | |
| |---|---|---|---| | |
| | `task_easy` | Easy | Single off-by-one error | 6+ | | |
| | `task_medium` | Medium | Two independent bugs | 6+ | | |
| | `task_hard` | Hard | 3+ subtle bugs in recursive function | 7+ | | |
| *Total: 19 procedurally generated tasks via `task_generator.py`.* | |
| ## Setup | |
| ```bash | |
| pip install openenv-core | |
| pip install git+https://huggingface.co/spaces/luciferai-devil/code-debug-env | |
| ``` | |
| ## Docker | |
| ```bash | |
| docker pull luciferai-devil/code-debug-env:latest | |
| docker run -p 8000:8000 luciferai-devil/code-debug-env | |
| ``` | |
| ## Baseline Results (via OpenAI API) | |
| Evaluated using `gpt-4o-mini` / `gpt-oss-120b` reasoning models. | |
| | Task | Agent | Score | Notes | | |
| |---|---|---|---| | |
| | task_easy | LLM | 0.99 | One-shot fix with CoT | | |
| | task_medium | LLM | 0.74 | Iterative refinement | | |
| | task_hard | LLM | 0.59 | Struggles with complex recursion depth | | |
| *Average Score: 0.77* | |
| ## Training with GRPO | |
| See `baseline/run_baseline.py` for the inference client. | |
| Compatible with TRL's `GRPOTrainer` β pass `reward_fn` that calls `/grader`. | |
| ## API Endpoints | |
| | Endpoint | Method | Description | | |
| |---|---|---| | |
| | `/health` | GET | Health check | | |
| | `/reset` | POST | Start a new episode | | |
| | `/step` | POST | Submit action, get observation | | |
| | `/state` | GET | Get current episode state | | |
| | `/tasks` | GET | List all available tasks | | |
| | `/grader` | GET | Grade a submission directly | | |
| | `/baseline` | GET | Run baseline agent on all tasks | | |
| ## Local Development | |
| ```bash | |
| # Run server locally | |
| uvicorn code_debug_env.server.app:app --reload --port 8000 | |
| # Build Docker | |
| docker build -t code-debug-env -f server/Dockerfile . | |
| # Run Docker | |
| docker run -p 8000:8000 code-debug-env | |
| # Smoke test | |
| curl http://localhost:8000/health | |
| curl -X POST http://localhost:8000/reset -H "Content-Type: application/json" -d '{}' | |
| curl http://localhost:8000/tasks | |
| ``` | |