Souravdanyal commited on
Commit
48c116c
Β·
1 Parent(s): cb43e45

readme file fixed

Browse files
Files changed (1) hide show
  1. README.md +78 -117
README.md CHANGED
@@ -1,5 +1,5 @@
1
- # Code Debug Environment
2
  ---
 
3
  title: Code Debug Environment
4
  emoji: 🐍
5
  colorFrom: blue
@@ -8,22 +8,24 @@ sdk: docker
8
  sdk_version: "1.0"
9
  app_file: server/app.py
10
  pinned: false
11
- ---
 
 
12
 
13
- An [OpenEnv](https://github.com/meta-pytorch/OpenEnv)-compatible RL environment where an LLM agent diagnoses and fixes buggy Python code across three difficulty levels.
14
 
15
  ---
16
 
17
  ## Overview
18
 
19
- | Property | Value |
20
- |---|---|
21
- | Domain | Real-world Python code debugging |
22
- | Tasks | 45 total (15 easy + 15 medium + 15 hard) |
23
- | Difficulties | easy β†’ medium β†’ hard |
24
- | Reward Range | 0.0 – 1.0 (partial, proportional) |
25
- | Max Steps/Episode | 3 |
26
- | API | OpenEnv standard: `/reset`, `/step`, `/state` |
27
 
28
  ---
29
 
@@ -31,196 +33,156 @@ An [OpenEnv](https://github.com/meta-pytorch/OpenEnv)-compatible RL environment
31
 
32
  The agent receives a buggy Python function and must fix it. Tasks come from real-world domains: data processing, string algorithms, API validation, sorting, dynamic programming, and graph algorithms.
33
 
34
- - **Easy**: One bug (wrong operator, off-by-one, incorrect return). Reward proportional to test pass rate.
35
- - **Medium**: Two bugs (logic bug + edge case). Reward proportional to test pass rate.
36
- - **Hard**: One algorithmic bug + agent must explain what was wrong. Reward = 0.7 Γ— test score + 0.3 Γ— explanation quality.
37
 
38
  ---
39
 
40
  ## Action Space
41
 
42
- ```json
43
  {
44
- "fixed_code": "string β€” the corrected Python function (required)",
45
- "explanation": "string β€” explanation of what was wrong (required for hard tasks)"
46
  }
47
- ```
48
 
49
- | Field | Type | Required | Description |
50
- |---|---|---|---|
51
- | `fixed_code` | `str` | Always | Complete corrected Python function as a string |
52
- | `explanation` | `str` | Hard tasks | Describe the bug and why your fix is correct |
53
 
54
  ---
55
 
56
  ## Observation Space
57
 
58
- Returned by `/reset` and `/step`:
59
 
60
- ```json
61
  {
62
- "task_id": "easy_003",
63
- "difficulty": "easy",
64
- "buggy_code": "def find_max(nums):\n ...",
65
- "instructions": "The function has exactly one bug. Fix it.",
66
- "test_cases_description": "Finds max value in a list without IndexError",
67
- "reward": 0.67,
68
- "passed_tests": 2,
69
- "total_tests": 3,
70
- "feedback": "Test 1: βœ… ...\nTest 2: βœ… ...\nTest 3: ❌ ...",
71
- "done": false
72
  }
73
- ```
74
-
75
- | Field | Type | Description |
76
- |---|---|---|
77
- | `task_id` | `str` | Unique task identifier |
78
- | `difficulty` | `str` | `easy` / `medium` / `hard` |
79
- | `buggy_code` | `str` | Buggy Python function to fix |
80
- | `instructions` | `str` | Task instructions |
81
- | `test_cases_description` | `str` | What the test cases check |
82
- | `reward` | `float\|null` | Score from last step (null on reset) |
83
- | `passed_tests` | `int\|null` | Tests passed (null on reset) |
84
- | `total_tests` | `int` | Total number of test cases |
85
- | `feedback` | `str\|null` | Detailed per-test feedback |
86
- | `done` | `bool` | True when episode is complete |
87
 
88
  ---
89
 
90
  ## Reward Function
91
 
92
- ### Easy & Medium
93
- ```
94
  reward = passed_tests / total_tests
95
- ```
96
- - 3/3 tests β†’ 1.0
97
- - 2/3 tests β†’ 0.67
98
- - 1/3 tests β†’ 0.33
99
- - 0/3 tests β†’ 0.0
100
-
101
- ### Hard
102
- ```
103
  reward = 0.7 Γ— test_score + 0.3 Γ— explanation_score
104
- ```
105
- Explanation is scored by matching key algorithmic concepts. Partial credit is given.
106
 
107
  ---
108
 
109
  ## Setup & Local Run
110
 
111
- ### Prerequisites
112
- - Python 3.10+
113
- - Docker
114
- - Hugging Face CLI
115
 
116
- ### Install
117
- ```bash
 
 
 
118
  git clone https://github.com/YOUR_USERNAME/code-debug-env
119
  cd code-debug-env
120
  pip install -e .
121
- # Also clone OpenEnv for PYTHONPATH
122
  git clone https://github.com/meta-pytorch/OpenEnv.git
123
  export PYTHONPATH=$PYTHONPATH:OpenEnv:OpenEnv/src:.
124
- ```
125
 
126
- ### Run locally
127
- ```bash
128
  uvicorn server.app:app --host 0.0.0.0 --port 7860 --reload
129
- ```
130
 
131
- ### Run with Docker
132
- ```bash
133
  docker build -f server/Dockerfile -t code-debug-env .
134
  docker run -p 7860:7860 code-debug-env
135
- ```
136
 
137
- ### Test the API
138
- ```bash
139
- # Health check
 
140
  curl http://localhost:7860/health
141
 
142
- # Reset (easy task)
143
- curl -X POST http://localhost:7860/reset \
144
- -H "Content-Type: application/json" \
145
- -d '{"difficulty": "easy"}'
146
 
147
- # Submit a fix
148
- curl -X POST http://localhost:7860/step \
149
- -H "Content-Type: application/json" \
150
- -d '{"fixed_code": "def find_max(nums):\n return max(nums)"}'
151
 
152
- # Check state
153
  curl http://localhost:7860/state
154
- ```
155
 
156
  ---
157
 
158
  ## Run Baseline Inference
159
 
160
- ```bash
161
  export API_BASE_URL="https://api.openai.com/v1"
162
  export MODEL_NAME="gpt-4o-mini"
163
  export HF_TOKEN="your-api-key"
164
 
165
- # Run all 3 difficulties
166
  python inference.py --url http://localhost:7860
167
-
168
- # Run specific difficulty
169
  python inference.py --url http://localhost:7860 --difficulty hard
170
- ```
171
 
172
  ---
173
 
174
  ## Pre-Submission Validation
175
 
176
- Run before submitting to catch any disqualifying issues:
177
-
178
- ```bash
179
- # Start the environment first, then:
180
  python validator/pre_submit_check.py --url http://localhost:7860
181
-
182
- # Or against your HF Space:
183
  python validator/pre_submit_check.py --url https://YOUR_SPACE.hf.space
184
- ```
185
 
186
  ---
187
 
188
  ## Deploy to Hugging Face Spaces
189
 
190
- ```bash
191
- # Login
192
  huggingface-cli login
193
 
194
- # Create space and push
195
  huggingface-cli repo create code-debug-env --type space --space_sdk docker
 
196
  cd code-debug-env
197
  git init
198
  git remote add origin https://huggingface.co/spaces/YOUR_USERNAME/code-debug-env
 
199
  git add .
200
  git commit -m "Initial commit"
201
  git push origin main
202
- ```
203
 
204
  ---
205
 
206
  ## Project Structure
207
 
208
- ```
209
  code-debug-env/
210
- β”œβ”€β”€ openenv.yaml ← OpenEnv manifest
211
- β”œβ”€β”€ inference.py ← Baseline agent (root, required)
212
- β”œβ”€β”€ pyproject.toml ← Dependencies
213
  β”œβ”€β”€ README.md
214
- β”œβ”€β”€ models.py ← Pydantic Action/Observation/State
215
- β”œβ”€β”€ client.py ← EnvClient for training loops
216
- β”œβ”€β”€ __init__.py
217
  β”œβ”€β”€ server/
218
- β”‚ β”œβ”€β”€ app.py ← FastAPI: /reset /step /state /health
219
- β”‚ β”œβ”€β”€ environment.py ← Core episode logic
220
  β”‚ β”œβ”€β”€ tasks/
221
- β”‚ β”‚ β”œβ”€β”€ task_easy.py ← 15 single-bug tasks
222
- β”‚ β”‚ β”œβ”€β”€ task_medium.py← 15 two-bug tasks
223
- β”‚ β”‚ └── task_hard.py ← 15 algorithmic tasks
224
  β”‚ β”œβ”€β”€ graders/
225
  β”‚ β”‚ β”œβ”€β”€ grader_easy.py
226
  β”‚ β”‚ β”œβ”€β”€ grader_medium.py
@@ -228,5 +190,4 @@ code-debug-env/
228
  β”‚ β”œβ”€β”€ requirements.txt
229
  β”‚ └── Dockerfile
230
  └── validator/
231
- └── pre_submit_check.py
232
- ```
 
 
1
  ---
2
+
3
  title: Code Debug Environment
4
  emoji: 🐍
5
  colorFrom: blue
 
8
  sdk_version: "1.0"
9
  app_file: server/app.py
10
  pinned: false
11
+ -------------
12
+
13
+ # Code Debug Environment
14
 
15
+ An OpenEnv-compatible RL environment where an LLM agent diagnoses and fixes buggy Python code across three difficulty levels.
16
 
17
  ---
18
 
19
  ## Overview
20
 
21
+ | Property | Value |
22
+ | ----------------- | --------------------------------------------- |
23
+ | Domain | Real-world Python code debugging |
24
+ | Tasks | 45 total (15 easy + 15 medium + 15 hard) |
25
+ | Difficulties | easy β†’ medium β†’ hard |
26
+ | Reward Range | 0.0 – 1.0 (partial, proportional) |
27
+ | Max Steps/Episode | 3 |
28
+ | API | OpenEnv standard: `/reset`, `/step`, `/state` |
29
 
30
  ---
31
 
 
33
 
34
  The agent receives a buggy Python function and must fix it. Tasks come from real-world domains: data processing, string algorithms, API validation, sorting, dynamic programming, and graph algorithms.
35
 
36
+ * Easy: One bug (wrong operator, off-by-one, incorrect return). Reward proportional to test pass rate.
37
+ * Medium: Two bugs (logic bug + edge case). Reward proportional to test pass rate.
38
+ * Hard: One algorithmic bug + agent must explain what was wrong. Reward = 0.7 Γ— test score + 0.3 Γ— explanation quality.
39
 
40
  ---
41
 
42
  ## Action Space
43
 
 
44
  {
45
+ "fixed_code": "string β€” the corrected Python function (required)",
46
+ "explanation": "string β€” explanation of what was wrong (required for hard tasks)"
47
  }
 
48
 
49
+ | Field | Type | Required | Description |
50
+ | ----------- | ---- | ---------- | ---------------------------------------------- |
51
+ | fixed_code | str | Always | Complete corrected Python function as a string |
52
+ | explanation | str | Hard tasks | Describe the bug and why your fix is correct |
53
 
54
  ---
55
 
56
  ## Observation Space
57
 
58
+ Returned by /reset and /step:
59
 
 
60
  {
61
+ "task_id": "easy_003",
62
+ "difficulty": "easy",
63
+ "buggy_code": "def find_max(nums):\n ...",
64
+ "instructions": "The function has exactly one bug. Fix it.",
65
+ "test_cases_description": "Finds max value in a list without IndexError",
66
+ "reward": 0.67,
67
+ "passed_tests": 2,
68
+ "total_tests": 3,
69
+ "feedback": "Test 1: βœ… ...\nTest 2: βœ… ...\nTest 3: ❌ ...",
70
+ "done": false
71
  }
72
+
73
+ | Field | Type | Description |
74
+ | ---------------------- | ---------- | ------------------------------------ |
75
+ | task_id | str | Unique task identifier |
76
+ | difficulty | str | easy / medium / hard |
77
+ | buggy_code | str | Buggy Python function to fix |
78
+ | instructions | str | Task instructions |
79
+ | test_cases_description | str | What the test cases check |
80
+ | reward | float/null | Score from last step (null on reset) |
81
+ | passed_tests | int/null | Tests passed |
82
+ | total_tests | int | Total test cases |
83
+ | feedback | str/null | Detailed feedback |
84
+ | done | bool | Episode complete |
 
85
 
86
  ---
87
 
88
  ## Reward Function
89
 
90
+ Easy & Medium
 
91
  reward = passed_tests / total_tests
92
+
93
+ Hard
 
 
 
 
 
 
94
  reward = 0.7 Γ— test_score + 0.3 Γ— explanation_score
 
 
95
 
96
  ---
97
 
98
  ## Setup & Local Run
99
 
100
+ Prerequisites
 
 
 
101
 
102
+ * Python 3.10+
103
+ * Docker
104
+ * Hugging Face CLI
105
+
106
+ Install
107
  git clone https://github.com/YOUR_USERNAME/code-debug-env
108
  cd code-debug-env
109
  pip install -e .
110
+
111
  git clone https://github.com/meta-pytorch/OpenEnv.git
112
  export PYTHONPATH=$PYTHONPATH:OpenEnv:OpenEnv/src:.
 
113
 
114
+ Run locally
 
115
  uvicorn server.app:app --host 0.0.0.0 --port 7860 --reload
 
116
 
117
+ Run with Docker
 
118
  docker build -f server/Dockerfile -t code-debug-env .
119
  docker run -p 7860:7860 code-debug-env
 
120
 
121
+ ---
122
+
123
+ ## Test the API
124
+
125
  curl http://localhost:7860/health
126
 
127
+ curl -X POST http://localhost:7860/reset -H "Content-Type: application/json" -d '{"difficulty": "easy"}'
 
 
 
128
 
129
+ curl -X POST http://localhost:7860/step -H "Content-Type: application/json" -d '{"fixed_code": "def find_max(nums): return max(nums)"}'
 
 
 
130
 
 
131
  curl http://localhost:7860/state
 
132
 
133
  ---
134
 
135
  ## Run Baseline Inference
136
 
 
137
  export API_BASE_URL="https://api.openai.com/v1"
138
  export MODEL_NAME="gpt-4o-mini"
139
  export HF_TOKEN="your-api-key"
140
 
 
141
  python inference.py --url http://localhost:7860
 
 
142
  python inference.py --url http://localhost:7860 --difficulty hard
 
143
 
144
  ---
145
 
146
  ## Pre-Submission Validation
147
 
 
 
 
 
148
  python validator/pre_submit_check.py --url http://localhost:7860
 
 
149
  python validator/pre_submit_check.py --url https://YOUR_SPACE.hf.space
 
150
 
151
  ---
152
 
153
  ## Deploy to Hugging Face Spaces
154
 
 
 
155
  huggingface-cli login
156
 
 
157
  huggingface-cli repo create code-debug-env --type space --space_sdk docker
158
+
159
  cd code-debug-env
160
  git init
161
  git remote add origin https://huggingface.co/spaces/YOUR_USERNAME/code-debug-env
162
+
163
  git add .
164
  git commit -m "Initial commit"
165
  git push origin main
 
166
 
167
  ---
168
 
169
  ## Project Structure
170
 
 
171
  code-debug-env/
172
+ β”œβ”€β”€ openenv.yaml
173
+ β”œβ”€β”€ inference.py
174
+ β”œβ”€β”€ pyproject.toml
175
  β”œβ”€β”€ README.md
176
+ β”œβ”€β”€ models.py
177
+ β”œβ”€β”€ client.py
178
+ β”œβ”€β”€ **init**.py
179
  β”œβ”€β”€ server/
180
+ β”‚ β”œβ”€β”€ app.py
181
+ β”‚ β”œβ”€β”€ environment.py
182
  β”‚ β”œβ”€β”€ tasks/
183
+ β”‚ β”‚ β”œβ”€β”€ task_easy.py
184
+ β”‚ β”‚ β”œβ”€β”€ task_medium.py
185
+ β”‚ β”‚ └── task_hard.py
186
  β”‚ β”œβ”€β”€ graders/
187
  β”‚ β”‚ β”œβ”€β”€ grader_easy.py
188
  β”‚ β”‚ β”œβ”€β”€ grader_medium.py
 
190
  β”‚ β”œβ”€β”€ requirements.txt
191
  β”‚ └── Dockerfile
192
  └── validator/
193
+ └── pre_submit_check.py