Spaces:

Souravdanyal
/

code-debug-env

Running

App Files Files Community

code-debug-env / README.md

Souravdanyal

fixed readme file

3985d80 about 2 months ago

preview code

raw

history blame contribute delete

5.99 kB

metadata

title: code-debug-env
emoji: 🧪
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
pinned: false

Code Debug Environment

An OpenEnv-compatible RL environment where an LLM agent diagnoses and fixes buggy Python code across three difficulty levels.

Overview

Property	Value
Domain	Real-world Python code debugging
Tasks	45 total (15 easy + 15 medium + 15 hard)
Difficulties	easy → medium → hard
Reward Range	0.0 – 1.0 (partial, proportional)
Max Steps/Episode	3
API	OpenEnv standard: `/reset`, `/step`, `/state`

Environment Description

The agent receives a buggy Python function and must fix it. Tasks come from real-world domains: data processing, string algorithms, API validation, sorting, dynamic programming, and graph algorithms.

Easy: One bug (wrong operator, off-by-one, incorrect return). Reward proportional to test pass rate.
Medium: Two bugs (logic bug + edge case). Reward proportional to test pass rate.
Hard: One algorithmic bug + agent must explain what was wrong. Reward = 0.7 × test score + 0.3 × explanation quality.

Action Space

{
  "fixed_code": "string — the corrected Python function (required)",
  "explanation": "string — explanation of what was wrong (required for hard tasks)"
}

Field	Type	Required	Description
`fixed_code`	`str`	Always	Complete corrected Python function as a string
`explanation`	`str`	Hard tasks	Describe the bug and why your fix is correct

Observation Space

Returned by /reset and /step:

{
  "task_id": "easy_003",
  "difficulty": "easy",
  "buggy_code": "def find_max(nums):\n    ...",
  "instructions": "The function has exactly one bug. Fix it.",
  "test_cases_description": "Finds max value in a list without IndexError",
  "reward": 0.67,
  "passed_tests": 2,
  "total_tests": 3,
  "feedback": "Test 1: ✅ ...\nTest 2: ✅ ...\nTest 3: ❌ ...",
  "done": false
}

Field	Type	Description
`task_id`	`str`	Unique task identifier
`difficulty`	`str`	`easy` / `medium` / `hard`
`buggy_code`	`str`	Buggy Python function to fix
`instructions`	`str`	Task instructions
`test_cases_description`	`str`	What the test cases check
`reward`	`float\|null`	Score from last step (null on reset)
`passed_tests`	`int\|null`	Tests passed (null on reset)
`total_tests`	`int`	Total number of test cases
`feedback`	`str\|null`	Detailed per-test feedback
`done`	`bool`	True when episode is complete

Reward Function

Easy & Medium

reward = passed_tests / total_tests

3/3 tests → 1.0
2/3 tests → 0.67
1/3 tests → 0.33
0/3 tests → 0.0

Hard

reward = 0.7 × test_score + 0.3 × explanation_score

Explanation is scored by matching key algorithmic concepts. Partial credit is given.

Setup & Local Run

Prerequisites

Python 3.10+
Docker
Hugging Face CLI

Install

git clone https://github.com/YOUR_USERNAME/code-debug-env
cd code-debug-env
pip install -e .
# Also clone OpenEnv for PYTHONPATH
git clone https://github.com/meta-pytorch/OpenEnv.git
export PYTHONPATH=$PYTHONPATH:OpenEnv:OpenEnv/src:.

Run locally

uvicorn server.app:app --host 0.0.0.0 --port 7860 --reload

Run with Docker

docker build -f server/Dockerfile -t code-debug-env .
docker run -p 7860:7860 code-debug-env

Test the API

# Health check
curl http://localhost:7860/health

# Reset (easy task)
curl -X POST http://localhost:7860/reset \
  -H "Content-Type: application/json" \
  -d '{"difficulty": "easy"}'

# Submit a fix
curl -X POST http://localhost:7860/step \
  -H "Content-Type: application/json" \
  -d '{"fixed_code": "def find_max(nums):\n    return max(nums)"}'

# Check state
curl http://localhost:7860/state

Run Baseline Inference

export API_BASE_URL="https://api.openai.com/v1"
export MODEL_NAME="gpt-4o-mini"
export HF_TOKEN="your-api-key"

# Run all 3 difficulties
python inference.py --url http://localhost:7860

# Run specific difficulty
python inference.py --url http://localhost:7860 --difficulty hard

Pre-Submission Validation

Run before submitting to catch any disqualifying issues:

# Start the environment first, then:
python validator/pre_submit_check.py --url http://localhost:7860

# Or against your HF Space:
python validator/pre_submit_check.py --url https://YOUR_SPACE.hf.space

Deploy to Hugging Face Spaces

# Login
huggingface-cli login

# Create space and push
huggingface-cli repo create code-debug-env --type space --space_sdk docker
cd code-debug-env
git init
git remote add origin https://huggingface.co/spaces/YOUR_USERNAME/code-debug-env
git add .
git commit -m "Initial commit"
git push origin main

Project Structure

code-debug-env/
├── openenv.yaml          ← OpenEnv manifest
├── inference.py          ← Baseline agent (root, required)
├── pyproject.toml        ← Dependencies
├── README.md
├── models.py             ← Pydantic Action/Observation/State
├── client.py             ← EnvClient for training loops
├── __init__.py
├── server/
│   ├── app.py            ← FastAPI: /reset /step /state /health
│   ├── environment.py    ← Core episode logic
│   ├── tasks/
│   │   ├── task_easy.py  ← 15 single-bug tasks
│   │   ├── task_medium.py← 15 two-bug tasks
│   │   └── task_hard.py  ← 15 algorithmic tasks
│   ├── graders/
│   │   ├── grader_easy.py
│   │   ├── grader_medium.py
│   │   └── grader_hard.py
│   ├── requirements.txt
│   └── Dockerfile
└── validator/
    └── pre_submit_check.py