--- title: code-debug-env emoji: "๐Ÿงช" colorFrom: blue colorTo: green sdk: docker app_port: 7860 pinned: false --- # Code Debug Environment An [OpenEnv](https://github.com/meta-pytorch/OpenEnv)-compatible RL environment where an LLM agent diagnoses and fixes buggy Python code across three difficulty levels. --- ## Overview | Property | Value | |---|---| | Domain | Real-world Python code debugging | | Tasks | 45 total (15 easy + 15 medium + 15 hard) | | Difficulties | easy โ†’ medium โ†’ hard | | Reward Range | 0.0 โ€“ 1.0 (partial, proportional) | | Max Steps/Episode | 3 | | API | OpenEnv standard: `/reset`, `/step`, `/state` | --- ## Environment Description The agent receives a buggy Python function and must fix it. Tasks come from real-world domains: data processing, string algorithms, API validation, sorting, dynamic programming, and graph algorithms. - **Easy**: One bug (wrong operator, off-by-one, incorrect return). Reward proportional to test pass rate. - **Medium**: Two bugs (logic bug + edge case). Reward proportional to test pass rate. - **Hard**: One algorithmic bug + agent must explain what was wrong. Reward = 0.7 ร— test score + 0.3 ร— explanation quality. --- ## Action Space ```json { "fixed_code": "string โ€” the corrected Python function (required)", "explanation": "string โ€” explanation of what was wrong (required for hard tasks)" } ``` | Field | Type | Required | Description | |---|---|---|---| | `fixed_code` | `str` | Always | Complete corrected Python function as a string | | `explanation` | `str` | Hard tasks | Describe the bug and why your fix is correct | --- ## Observation Space Returned by `/reset` and `/step`: ```json { "task_id": "easy_003", "difficulty": "easy", "buggy_code": "def find_max(nums):\n ...", "instructions": "The function has exactly one bug. Fix it.", "test_cases_description": "Finds max value in a list without IndexError", "reward": 0.67, "passed_tests": 2, "total_tests": 3, "feedback": "Test 1: โœ… ...\nTest 2: โœ… ...\nTest 3: โŒ ...", "done": false } ``` | Field | Type | Description | |---|---|---| | `task_id` | `str` | Unique task identifier | | `difficulty` | `str` | `easy` / `medium` / `hard` | | `buggy_code` | `str` | Buggy Python function to fix | | `instructions` | `str` | Task instructions | | `test_cases_description` | `str` | What the test cases check | | `reward` | `float\|null` | Score from last step (null on reset) | | `passed_tests` | `int\|null` | Tests passed (null on reset) | | `total_tests` | `int` | Total number of test cases | | `feedback` | `str\|null` | Detailed per-test feedback | | `done` | `bool` | True when episode is complete | --- ## Reward Function ### Easy & Medium ``` reward = passed_tests / total_tests ``` - 3/3 tests โ†’ 1.0 - 2/3 tests โ†’ 0.67 - 1/3 tests โ†’ 0.33 - 0/3 tests โ†’ 0.0 ### Hard ``` reward = 0.7 ร— test_score + 0.3 ร— explanation_score ``` Explanation is scored by matching key algorithmic concepts. Partial credit is given. --- ## Setup & Local Run ### Prerequisites - Python 3.10+ - Docker - Hugging Face CLI ### Install ```bash git clone https://github.com/YOUR_USERNAME/code-debug-env cd code-debug-env pip install -e . # Also clone OpenEnv for PYTHONPATH git clone https://github.com/meta-pytorch/OpenEnv.git export PYTHONPATH=$PYTHONPATH:OpenEnv:OpenEnv/src:. ``` ### Run locally ```bash uvicorn server.app:app --host 0.0.0.0 --port 7860 --reload ``` ### Run with Docker ```bash docker build -f server/Dockerfile -t code-debug-env . docker run -p 7860:7860 code-debug-env ``` ### Test the API ```bash # Health check curl http://localhost:7860/health # Reset (easy task) curl -X POST http://localhost:7860/reset \ -H "Content-Type: application/json" \ -d '{"difficulty": "easy"}' # Submit a fix curl -X POST http://localhost:7860/step \ -H "Content-Type: application/json" \ -d '{"fixed_code": "def find_max(nums):\n return max(nums)"}' # Check state curl http://localhost:7860/state ``` --- ## Run Baseline Inference ```bash export API_BASE_URL="https://api.openai.com/v1" export MODEL_NAME="gpt-4o-mini" export HF_TOKEN="your-api-key" # Run all 3 difficulties python inference.py --url http://localhost:7860 # Run specific difficulty python inference.py --url http://localhost:7860 --difficulty hard ``` --- ## Pre-Submission Validation Run before submitting to catch any disqualifying issues: ```bash # Start the environment first, then: python validator/pre_submit_check.py --url http://localhost:7860 # Or against your HF Space: python validator/pre_submit_check.py --url https://YOUR_SPACE.hf.space ``` --- ## Deploy to Hugging Face Spaces ```bash # Login huggingface-cli login # Create space and push huggingface-cli repo create code-debug-env --type space --space_sdk docker cd code-debug-env git init git remote add origin https://huggingface.co/spaces/YOUR_USERNAME/code-debug-env git add . git commit -m "Initial commit" git push origin main ``` --- ## Project Structure ``` code-debug-env/ โ”œโ”€โ”€ openenv.yaml โ† OpenEnv manifest โ”œโ”€โ”€ inference.py โ† Baseline agent (root, required) โ”œโ”€โ”€ pyproject.toml โ† Dependencies โ”œโ”€โ”€ README.md โ”œโ”€โ”€ models.py โ† Pydantic Action/Observation/State โ”œโ”€โ”€ client.py โ† EnvClient for training loops โ”œโ”€โ”€ __init__.py โ”œโ”€โ”€ server/ โ”‚ โ”œโ”€โ”€ app.py โ† FastAPI: /reset /step /state /health โ”‚ โ”œโ”€โ”€ environment.py โ† Core episode logic โ”‚ โ”œโ”€โ”€ tasks/ โ”‚ โ”‚ โ”œโ”€โ”€ task_easy.py โ† 15 single-bug tasks โ”‚ โ”‚ โ”œโ”€โ”€ task_medium.pyโ† 15 two-bug tasks โ”‚ โ”‚ โ””โ”€โ”€ task_hard.py โ† 15 algorithmic tasks โ”‚ โ”œโ”€โ”€ graders/ โ”‚ โ”‚ โ”œโ”€โ”€ grader_easy.py โ”‚ โ”‚ โ”œโ”€โ”€ grader_medium.py โ”‚ โ”‚ โ””โ”€โ”€ grader_hard.py โ”‚ โ”œโ”€โ”€ requirements.txt โ”‚ โ””โ”€โ”€ Dockerfile โ””โ”€โ”€ validator/ โ””โ”€โ”€ pre_submit_check.py ```