Spaces:
Running
Running
File size: 5,994 Bytes
3985d80 48c116c 6464b1f d510c1d 2ce1061 d510c1d 2ce1061 d510c1d 2ce1061 d510c1d 2ce1061 d510c1d 2ce1061 d510c1d 2ce1061 d510c1d 2ce1061 d510c1d 2ce1061 d510c1d 2ce1061 d510c1d 2ce1061 d510c1d 2ce1061 d510c1d 2ce1061 d510c1d 2ce1061 d510c1d 2ce1061 d510c1d 2ce1061 d510c1d 8485798 2ce1061 d510c1d 2ce1061 d510c1d 2ce1061 d510c1d 2ce1061 d510c1d 2ce1061 d510c1d 2ce1061 d510c1d 2ce1061 d510c1d 2ce1061 d510c1d 2ce1061 d510c1d 2ce1061 d510c1d 2ce1061 d510c1d 2ce1061 d510c1d 2ce1061 d510c1d 2ce1061 d510c1d 2ce1061 d510c1d 2ce1061 d510c1d 2ce1061 d510c1d 8485798 d510c1d 8485798 d510c1d 8485798 d510c1d 8485798 d510c1d 2ce1061 d510c1d 2ce1061 d510c1d 2ce1061 d510c1d 2ce1061 d510c1d 2ce1061 d510c1d 2ce1061 d510c1d | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 | ---
title: code-debug-env
emoji: "π§ͺ"
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
pinned: false
---
# Code Debug Environment
An [OpenEnv](https://github.com/meta-pytorch/OpenEnv)-compatible RL environment where an LLM agent diagnoses and fixes buggy Python code across three difficulty levels.
---
## Overview
| Property | Value |
|---|---|
| Domain | Real-world Python code debugging |
| Tasks | 45 total (15 easy + 15 medium + 15 hard) |
| Difficulties | easy β medium β hard |
| Reward Range | 0.0 β 1.0 (partial, proportional) |
| Max Steps/Episode | 3 |
| API | OpenEnv standard: `/reset`, `/step`, `/state` |
---
## Environment Description
The agent receives a buggy Python function and must fix it. Tasks come from real-world domains: data processing, string algorithms, API validation, sorting, dynamic programming, and graph algorithms.
- **Easy**: One bug (wrong operator, off-by-one, incorrect return). Reward proportional to test pass rate.
- **Medium**: Two bugs (logic bug + edge case). Reward proportional to test pass rate.
- **Hard**: One algorithmic bug + agent must explain what was wrong. Reward = 0.7 Γ test score + 0.3 Γ explanation quality.
---
## Action Space
```json
{
"fixed_code": "string β the corrected Python function (required)",
"explanation": "string β explanation of what was wrong (required for hard tasks)"
}
```
| Field | Type | Required | Description |
|---|---|---|---|
| `fixed_code` | `str` | Always | Complete corrected Python function as a string |
| `explanation` | `str` | Hard tasks | Describe the bug and why your fix is correct |
---
## Observation Space
Returned by `/reset` and `/step`:
```json
{
"task_id": "easy_003",
"difficulty": "easy",
"buggy_code": "def find_max(nums):\n ...",
"instructions": "The function has exactly one bug. Fix it.",
"test_cases_description": "Finds max value in a list without IndexError",
"reward": 0.67,
"passed_tests": 2,
"total_tests": 3,
"feedback": "Test 1: β
...\nTest 2: β
...\nTest 3: β ...",
"done": false
}
```
| Field | Type | Description |
|---|---|---|
| `task_id` | `str` | Unique task identifier |
| `difficulty` | `str` | `easy` / `medium` / `hard` |
| `buggy_code` | `str` | Buggy Python function to fix |
| `instructions` | `str` | Task instructions |
| `test_cases_description` | `str` | What the test cases check |
| `reward` | `float\|null` | Score from last step (null on reset) |
| `passed_tests` | `int\|null` | Tests passed (null on reset) |
| `total_tests` | `int` | Total number of test cases |
| `feedback` | `str\|null` | Detailed per-test feedback |
| `done` | `bool` | True when episode is complete |
---
## Reward Function
### Easy & Medium
```
reward = passed_tests / total_tests
```
- 3/3 tests β 1.0
- 2/3 tests β 0.67
- 1/3 tests β 0.33
- 0/3 tests β 0.0
### Hard
```
reward = 0.7 Γ test_score + 0.3 Γ explanation_score
```
Explanation is scored by matching key algorithmic concepts. Partial credit is given.
---
## Setup & Local Run
### Prerequisites
- Python 3.10+
- Docker
- Hugging Face CLI
### Install
```bash
git clone https://github.com/YOUR_USERNAME/code-debug-env
cd code-debug-env
pip install -e .
# Also clone OpenEnv for PYTHONPATH
git clone https://github.com/meta-pytorch/OpenEnv.git
export PYTHONPATH=$PYTHONPATH:OpenEnv:OpenEnv/src:.
```
### Run locally
```bash
uvicorn server.app:app --host 0.0.0.0 --port 7860 --reload
```
### Run with Docker
```bash
docker build -f server/Dockerfile -t code-debug-env .
docker run -p 7860:7860 code-debug-env
```
### Test the API
```bash
# Health check
curl http://localhost:7860/health
# Reset (easy task)
curl -X POST http://localhost:7860/reset \
-H "Content-Type: application/json" \
-d '{"difficulty": "easy"}'
# Submit a fix
curl -X POST http://localhost:7860/step \
-H "Content-Type: application/json" \
-d '{"fixed_code": "def find_max(nums):\n return max(nums)"}'
# Check state
curl http://localhost:7860/state
```
---
## Run Baseline Inference
```bash
export API_BASE_URL="https://api.openai.com/v1"
export MODEL_NAME="gpt-4o-mini"
export HF_TOKEN="your-api-key"
# Run all 3 difficulties
python inference.py --url http://localhost:7860
# Run specific difficulty
python inference.py --url http://localhost:7860 --difficulty hard
```
---
## Pre-Submission Validation
Run before submitting to catch any disqualifying issues:
```bash
# Start the environment first, then:
python validator/pre_submit_check.py --url http://localhost:7860
# Or against your HF Space:
python validator/pre_submit_check.py --url https://YOUR_SPACE.hf.space
```
---
## Deploy to Hugging Face Spaces
```bash
# Login
huggingface-cli login
# Create space and push
huggingface-cli repo create code-debug-env --type space --space_sdk docker
cd code-debug-env
git init
git remote add origin https://huggingface.co/spaces/YOUR_USERNAME/code-debug-env
git add .
git commit -m "Initial commit"
git push origin main
```
---
## Project Structure
```
code-debug-env/
βββ openenv.yaml β OpenEnv manifest
βββ inference.py β Baseline agent (root, required)
βββ pyproject.toml β Dependencies
βββ README.md
βββ models.py β Pydantic Action/Observation/State
βββ client.py β EnvClient for training loops
βββ __init__.py
βββ server/
β βββ app.py β FastAPI: /reset /step /state /health
β βββ environment.py β Core episode logic
β βββ tasks/
β β βββ task_easy.py β 15 single-bug tasks
β β βββ task_medium.pyβ 15 two-bug tasks
β β βββ task_hard.py β 15 algorithmic tasks
β βββ graders/
β β βββ grader_easy.py
β β βββ grader_medium.py
β β βββ grader_hard.py
β βββ requirements.txt
β βββ Dockerfile
βββ validator/
βββ pre_submit_check.py
```
|