Spaces:

Souravdanyal
/

code-debug-env

Running

App Files Files Community

Souravdanyal commited on Apr 5

Commit

48c116c

1 Parent(s): cb43e45

readme file fixed

Browse files

Files changed (1) hide show

README.md +78 -117

README.md CHANGED Viewed

@@ -1,5 +1,5 @@
-# Code Debug Environment
 ---
 title: Code Debug Environment
 emoji: 🐍
 colorFrom: blue
@@ -8,22 +8,24 @@ sdk: docker
 sdk_version: "1.0"
 app_file: server/app.py
 pinned: false
----
-An [OpenEnv](https://github.com/meta-pytorch/OpenEnv)-compatible RL environment where an LLM agent diagnoses and fixes buggy Python code across three difficulty levels.
 ---
 ## Overview
-| Property | Value |
-|---|---|
-| Domain | Real-world Python code debugging |
-| Tasks | 45 total (15 easy + 15 medium + 15 hard) |
-| Difficulties | easy → medium → hard |
-| Reward Range | 0.0 – 1.0 (partial, proportional) |
-| Max Steps/Episode | 3 |
-| API | OpenEnv standard: `/reset`, `/step`, `/state` |
 ---
@@ -31,196 +33,156 @@ An [OpenEnv](https://github.com/meta-pytorch/OpenEnv)-compatible RL environment
 The agent receives a buggy Python function and must fix it. Tasks come from real-world domains: data processing, string algorithms, API validation, sorting, dynamic programming, and graph algorithms.
-- **Easy**: One bug (wrong operator, off-by-one, incorrect return). Reward proportional to test pass rate.
-- **Medium**: Two bugs (logic bug + edge case). Reward proportional to test pass rate.
-- **Hard**: One algorithmic bug + agent must explain what was wrong. Reward = 0.7 × test score + 0.3 × explanation quality.
 ---
 ## Action Space
-```json
 {
-  "fixed_code": "string — the corrected Python function (required)",
-  "explanation": "string — explanation of what was wrong (required for hard tasks)"
 }
-```
-| Field | Type | Required | Description |
-|---|---|---|---|
-| `fixed_code` | `str` | Always | Complete corrected Python function as a string |
-| `explanation` | `str` | Hard tasks | Describe the bug and why your fix is correct |
 ---
 ## Observation Space
-Returned by `/reset` and `/step`:
-```json
 {
-  "task_id": "easy_003",
-  "difficulty": "easy",
-  "buggy_code": "def find_max(nums):\n    ...",
-  "instructions": "The function has exactly one bug. Fix it.",
-  "test_cases_description": "Finds max value in a list without IndexError",
-  "reward": 0.67,
-  "passed_tests": 2,
-  "total_tests": 3,
-  "feedback": "Test 1: ✅ ...\nTest 2: ✅ ...\nTest 3: ❌ ...",
-  "done": false
 }
-```
-| Field | Type | Description |
-|---|---|---|
-| `task_id` | `str` | Unique task identifier |
-| `difficulty` | `str` | `easy` / `medium` / `hard` |
-| `buggy_code` | `str` | Buggy Python function to fix |
-| `instructions` | `str` | Task instructions |
-| `test_cases_description` | `str` | What the test cases check |
-| `reward` | `float\|null` | Score from last step (null on reset) |
-| `passed_tests` | `int\|null` | Tests passed (null on reset) |
-| `total_tests` | `int` | Total number of test cases |
-| `feedback` | `str\|null` | Detailed per-test feedback |
-| `done` | `bool` | True when episode is complete |
 ---
 ## Reward Function
-### Easy & Medium
-```
 reward = passed_tests / total_tests
-```
-- 3/3 tests → 1.0
-- 2/3 tests → 0.67
-- 1/3 tests → 0.33
-- 0/3 tests → 0.0
-### Hard
-```
 reward = 0.7 × test_score + 0.3 × explanation_score
-```
-Explanation is scored by matching key algorithmic concepts. Partial credit is given.
 ---
 ## Setup & Local Run
-### Prerequisites
-- Python 3.10+
-- Docker
-- Hugging Face CLI
-### Install
-```bash
 git clone https://github.com/YOUR_USERNAME/code-debug-env
 cd code-debug-env
 pip install -e .
-# Also clone OpenEnv for PYTHONPATH
 git clone https://github.com/meta-pytorch/OpenEnv.git
 export PYTHONPATH=$PYTHONPATH:OpenEnv:OpenEnv/src:.
-```
-### Run locally
-```bash
 uvicorn server.app:app --host 0.0.0.0 --port 7860 --reload
-```
-### Run with Docker
-```bash
 docker build -f server/Dockerfile -t code-debug-env .
 docker run -p 7860:7860 code-debug-env
-```
-### Test the API
-```bash
-# Health check
 curl http://localhost:7860/health
-# Reset (easy task)
-curl -X POST http://localhost:7860/reset \
-  -H "Content-Type: application/json" \
-  -d '{"difficulty": "easy"}'
-# Submit a fix
-curl -X POST http://localhost:7860/step \
-  -H "Content-Type: application/json" \
-  -d '{"fixed_code": "def find_max(nums):\n    return max(nums)"}'
-# Check state
 curl http://localhost:7860/state
-```
 ---
 ## Run Baseline Inference
-```bash
 export API_BASE_URL="https://api.openai.com/v1"
 export MODEL_NAME="gpt-4o-mini"
 export HF_TOKEN="your-api-key"
-# Run all 3 difficulties
 python inference.py --url http://localhost:7860
-# Run specific difficulty
 python inference.py --url http://localhost:7860 --difficulty hard
-```
 ---
 ## Pre-Submission Validation
-Run before submitting to catch any disqualifying issues:
-```bash
-# Start the environment first, then:
 python validator/pre_submit_check.py --url http://localhost:7860
-# Or against your HF Space:
 python validator/pre_submit_check.py --url https://YOUR_SPACE.hf.space
-```
 ---
 ## Deploy to Hugging Face Spaces
-```bash
-# Login
 huggingface-cli login
-# Create space and push
 huggingface-cli repo create code-debug-env --type space --space_sdk docker
 cd code-debug-env
 git init
 git remote add origin https://huggingface.co/spaces/YOUR_USERNAME/code-debug-env
 git add .
 git commit -m "Initial commit"
 git push origin main
-```
 ---
 ## Project Structure
-```
 code-debug-env/
-├── openenv.yaml          ← OpenEnv manifest
-├── inference.py          ← Baseline agent (root, required)
-├── pyproject.toml        ← Dependencies
 ├── README.md
-├── models.py             ← Pydantic Action/Observation/State
-├── client.py             ← EnvClient for training loops
-├── __init__.py
 ├── server/
-│   ├── app.py            ← FastAPI: /reset /step /state /health
-│   ├── environment.py    ← Core episode logic
 │   ├── tasks/
-│   │   ├── task_easy.py  ← 15 single-bug tasks
-│   │   ├── task_medium.py← 15 two-bug tasks
-│   │   └── task_hard.py  ← 15 algorithmic tasks
 │   ├── graders/
 │   │   ├── grader_easy.py
 │   │   ├── grader_medium.py
@@ -228,5 +190,4 @@ code-debug-env/
 │   ├── requirements.txt
 │   └── Dockerfile
 └── validator/
-    └── pre_submit_check.py
-```

 ---
 title: Code Debug Environment
 emoji: 🐍
 colorFrom: blue
 sdk_version: "1.0"
 app_file: server/app.py
 pinned: false
+-------------
+# Code Debug Environment
+An OpenEnv-compatible RL environment where an LLM agent diagnoses and fixes buggy Python code across three difficulty levels.
 ---
 ## Overview
+| Property          | Value                                         |
+| ----------------- | --------------------------------------------- |
+| Domain            | Real-world Python code debugging              |
+| Tasks             | 45 total (15 easy + 15 medium + 15 hard)      |
+| Difficulties      | easy → medium → hard                          |
+| Reward Range      | 0.0 – 1.0 (partial, proportional)             |
+| Max Steps/Episode | 3                                             |
+| API               | OpenEnv standard: `/reset`, `/step`, `/state` |
 ---
 The agent receives a buggy Python function and must fix it. Tasks come from real-world domains: data processing, string algorithms, API validation, sorting, dynamic programming, and graph algorithms.
+* Easy: One bug (wrong operator, off-by-one, incorrect return). Reward proportional to test pass rate.
+* Medium: Two bugs (logic bug + edge case). Reward proportional to test pass rate.
+* Hard: One algorithmic bug + agent must explain what was wrong. Reward = 0.7 × test score + 0.3 × explanation quality.
 ---
 ## Action Space
 {
+"fixed_code": "string — the corrected Python function (required)",
+"explanation": "string — explanation of what was wrong (required for hard tasks)"
 }
+| Field       | Type | Required   | Description                                    |
+| ----------- | ---- | ---------- | ---------------------------------------------- |
+| fixed_code  | str  | Always     | Complete corrected Python function as a string |
+| explanation | str  | Hard tasks | Describe the bug and why your fix is correct   |
 ---
 ## Observation Space
+Returned by /reset and /step:
 {
+"task_id": "easy_003",
+"difficulty": "easy",
+"buggy_code": "def find_max(nums):\n    ...",
+"instructions": "The function has exactly one bug. Fix it.",
+"test_cases_description": "Finds max value in a list without IndexError",
+"reward": 0.67,
+"passed_tests": 2,
+"total_tests": 3,
+"feedback": "Test 1: ✅ ...\nTest 2: ✅ ...\nTest 3: ❌ ...",
+"done": false
 }
+| Field                  | Type       | Description                          |
+| ---------------------- | ---------- | ------------------------------------ |
+| task_id                | str        | Unique task identifier               |
+| difficulty             | str        | easy / medium / hard                 |
+| buggy_code             | str        | Buggy Python function to fix         |
+| instructions           | str        | Task instructions                    |
+| test_cases_description | str        | What the test cases check            |
+| reward                 | float/null | Score from last step (null on reset) |
+| passed_tests           | int/null   | Tests passed                         |
+| total_tests            | int        | Total test cases                     |
+| feedback               | str/null   | Detailed feedback                    |
+| done                   | bool       | Episode complete                     |
 ---
 ## Reward Function
+Easy & Medium
 reward = passed_tests / total_tests
+Hard
 reward = 0.7 × test_score + 0.3 × explanation_score
 ---
 ## Setup & Local Run
+Prerequisites
+* Python 3.10+
+* Docker
+* Hugging Face CLI
+Install
 git clone https://github.com/YOUR_USERNAME/code-debug-env
 cd code-debug-env
 pip install -e .
 git clone https://github.com/meta-pytorch/OpenEnv.git
 export PYTHONPATH=$PYTHONPATH:OpenEnv:OpenEnv/src:.
+Run locally
 uvicorn server.app:app --host 0.0.0.0 --port 7860 --reload
+Run with Docker
 docker build -f server/Dockerfile -t code-debug-env .
 docker run -p 7860:7860 code-debug-env
+---
+## Test the API
 curl http://localhost:7860/health
+curl -X POST http://localhost:7860/reset -H "Content-Type: application/json" -d '{"difficulty": "easy"}'
+curl -X POST http://localhost:7860/step -H "Content-Type: application/json" -d '{"fixed_code": "def find_max(nums): return max(nums)"}'
 curl http://localhost:7860/state
 ---
 ## Run Baseline Inference
 export API_BASE_URL="https://api.openai.com/v1"
 export MODEL_NAME="gpt-4o-mini"
 export HF_TOKEN="your-api-key"
 python inference.py --url http://localhost:7860
 python inference.py --url http://localhost:7860 --difficulty hard
 ---
 ## Pre-Submission Validation
 python validator/pre_submit_check.py --url http://localhost:7860
 python validator/pre_submit_check.py --url https://YOUR_SPACE.hf.space
 ---
 ## Deploy to Hugging Face Spaces
 huggingface-cli login
 huggingface-cli repo create code-debug-env --type space --space_sdk docker
 cd code-debug-env
 git init
 git remote add origin https://huggingface.co/spaces/YOUR_USERNAME/code-debug-env
 git add .
 git commit -m "Initial commit"
 git push origin main
 ---
 ## Project Structure
 code-debug-env/
+├── openenv.yaml
+├── inference.py
+├── pyproject.toml
 ├── README.md
+├── models.py
+├── client.py
+├── **init**.py
 ├── server/
+│   ├── app.py
+│   ├── environment.py
 │   ├── tasks/
+│   │   ├── task_easy.py
+│   │   ├── task_medium.py
+│   │   └── task_hard.py
 │   ├── graders/
 │   │   ├── grader_easy.py
 │   │   ├── grader_medium.py
 │   ├── requirements.txt
 │   └── Dockerfile
 └── validator/
+└── pre_submit_check.py