OpenEnv Compatible
Code Debug Environment
An RL environment where LLM agents diagnose and fix buggy Python code
Live and Running
45
Total Tasks
3
Difficulty Levels
0 to 1.0
Reward Range
Live Tester
Try the environment interactively
Random difficulty
Easy — 1 bug
Medium — 2 bugs
Hard — algorithmic bug + explanation
Get Task
Loading task...
Buggy code
Your fix
Explanation (required for hard tasks)
Submit Fix
Grading your fix...
API Endpoints
GET
/health
Health check
→
POST
/reset
Start a new episode — pass difficulty: easy | medium | hard
→
POST
/step
Submit fixed code — returns reward (0.0 to 1.0) and feedback
→
GET
/state
Current episode state
→
GET
/tasks
List all 45 task IDs
→
Difficulty Levels
Easy
15 tasks
1 bug per task
reward = tests passed / 3
Medium
15 tasks
2 bugs per task
reward = tests passed / 3
Hard
15 tasks
algorithmic bug
reward = 0.7 x code + 0.3 x explanation