--- title: Code Debug Env emoji: 🐞 colorFrom: blue colorTo: indigo sdk: docker app_port: 8000 base_path: /web --- # code-debug-env An OpenEnv environment for training AI agents to repair buggy Python code. The agent receives a broken function and must iteratively submit patches until all unit tests pass. ## Quick Start ```python from code_debug_env import CodeDebugEnv, Action async with CodeDebugEnv(base_url="https://luciferai-devil-code-debug-env.hf.space") as env: obs = await env.reset(task_id="task_easy") print(obs.buggy_code) # The broken function result = await env.step(Action( patch="def find_max_subarray_sum(nums):\n ...", task_id="task_easy", think="The off-by-one error is in range(1, len(nums)-1)" )) print(result.observation.score) # 0.0–1.0 ``` ## Action Space | Field | Type | Required | Description | |---|---|---|---| | `patch` | str | Yes | Full Python source replacement for the function | | `task_id` | str | Yes | Which task to target | | `think` | str | No | Chain-of-thought reasoning (earns +0.2 reward bonus) | ## Observation Space | Field | Type | Description | |---|---|---| | `buggy_code` | str | Current version of the code | | `test_results` | list | Per-test pass/fail with error messages | | `passed` / `total` | int | Tests passing out of total | | `score` | float | Composite reward for this step (0.0–1.0) | | `done` | bool | True when all tests pass or max_steps reached | ## Reward Function ``` r = 0.5 × (tests_passed / tests_total) # correctness + 0.2 × (1 if valid syntax else 0) # format + 0.2 × (1 if provided else 0) # chain-of-thought bonus + 0.1 × (steps_remaining / max_steps) # efficiency − 0.3 × (1 if timeout/crash else 0) # penalty ``` ## Tasks | ID | Difficulty | Description | Variants | |---|---|---|---| | `task_easy` | Easy | Single off-by-one error | 6+ | | `task_medium` | Medium | Two independent bugs | 6+ | | `task_hard` | Hard | 3+ subtle bugs in recursive function | 7+ | *Total: 19 procedurally generated tasks via `task_generator.py`.* ## Setup ```bash pip install openenv-core pip install git+https://huggingface.co/spaces/luciferai-devil/code-debug-env ``` ## Docker ```bash docker pull luciferai-devil/code-debug-env:latest docker run -p 8000:8000 luciferai-devil/code-debug-env ``` ## Baseline Results (via OpenAI API) Evaluated using `gpt-4o-mini` / `gpt-oss-120b` reasoning models. | Task | Agent | Score | Notes | |---|---|---|---| | task_easy | LLM | 0.99 | One-shot fix with CoT | | task_medium | LLM | 0.74 | Iterative refinement | | task_hard | LLM | 0.59 | Struggles with complex recursion depth | *Average Score: 0.77* ## Training with GRPO See `baseline/run_baseline.py` for the inference client. Compatible with TRL's `GRPOTrainer` — pass `reward_fn` that calls `/grader`. ## API Endpoints | Endpoint | Method | Description | |---|---|---| | `/health` | GET | Health check | | `/reset` | POST | Start a new episode | | `/step` | POST | Submit action, get observation | | `/state` | GET | Get current episode state | | `/tasks` | GET | List all available tasks | | `/grader` | GET | Grade a submission directly | | `/baseline` | GET | Run baseline agent on all tasks | ## Local Development ```bash # Run server locally uvicorn code_debug_env.server.app:app --reload --port 8000 # Build Docker docker build -t code-debug-env -f server/Dockerfile . # Run Docker docker run -p 8000:8000 code-debug-env # Smoke test curl http://localhost:8000/health curl -X POST http://localhost:8000/reset -H "Content-Type: application/json" -d '{}' curl http://localhost:8000/tasks ```