OpenEnv Compatible

Code Debug Environment

An RL environment where LLM agents diagnose and fix buggy Python code

Live and Running
45
Total Tasks
3
Difficulty Levels
0 to 1.0
Reward Range
Live Tester
Try the environment interactively
Loading task...
Buggy code

      
Your fix
Grading your fix...
API Endpoints
GET
/health
Health check
POST
/reset
Start a new episode — pass difficulty: easy | medium | hard
POST
/step
Submit fixed code — returns reward (0.0 to 1.0) and feedback
GET
/state
Current episode state
GET
/tasks
List all 45 task IDs
Difficulty Levels
Easy
15 tasks
1 bug per task
reward = tests passed / 3
Medium
15 tasks
2 bugs per task
reward = tests passed / 3
Hard
15 tasks
algorithmic bug
reward = 0.7 x code + 0.3 x explanation