Spaces:
Running
Running
File size: 2,033 Bytes
8485798 c01667e 8485798 c01667e 8485798 c01667e 8485798 c01667e 8485798 c01667e 8485798 c01667e 8485798 c01667e 8485798 c01667e 8485798 c01667e 8485798 c01667e 8485798 c01667e 8485798 c01667e 8485798 c01667e 8485798 c01667e 8485798 c01667e 8485798 c01667e 8485798 c01667e 8485798 c01667e 8485798 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 | # Code Debug Environment — API Specification
## Overview
OpenEnv-compatible RL environment where LLM agents fix buggy Python code.
- 45 tasks: 15 easy + 15 medium + 15 hard
- Partial rewards: 0.33, 0.67, 1.0 based on test cases passed
- Hard tasks: reward = 0.7×code + 0.3×explanation
## Endpoints
### GET /health
```json
{"status": "ok", "environment": "code-debug-env", "version": "1.0.0"}
```
### POST /reset
```json
// Request
{"difficulty": "easy"} // or "medium", "hard", omit for random
// Response
{
"observation": {
"task_id": "easy_003",
"difficulty": "easy",
"buggy_code": "def find_max(nums):\n return min(nums)",
"instructions": "The function has exactly one bug. Fix it.",
"test_cases_description": "Finds max value in a list",
"reward": null, "passed_tests": null, "total_tests": 3,
"feedback": null, "done": false
},
"reward": 0.0, "done": false
}
```
### POST /step
```json
// Request
{"fixed_code": "def find_max(nums):\n return max(nums)", "explanation": "optional for hard"}
// Response
{
"observation": {
"task_id": "easy_003", "reward": 1.0,
"passed_tests": 3, "total_tests": 3,
"feedback": "Test 1: ✅ Passed\n Input: [1,2,3]\n Expected: 3\n Got: 3",
"done": true
},
"reward": 1.0, "done": true
}
```
### GET /state
```json
{"episode_id": "uuid", "task_id": "easy_003", "difficulty": "easy",
"step_count": 1, "max_steps": 5, "current_reward": 1.0, "best_reward": 1.0, "done": true}
```
### GET /tasks
```json
{"easy": ["easy_001",...], "medium": ["medium_001",...], "hard": ["hard_001",...], "total": 45}
```
## Reward Design
| Task | Formula |
|------|---------|
| Easy | passed/3 |
| Medium | passed/3 |
| Hard | 0.7×code_score + 0.3×explanation_score |
## Invalid Actions
- Empty code → reward=0.0 + penalty feedback
- Infinite loop → TimeoutError → reward=0.0 + hint to add visited set
## Episode Rules
- Max 5 steps per episode
- Ends when reward=1.0 OR max steps reached
- 3 deterministic test cases per task
|