md896 commited on
Commit
18e112b
·
1 Parent(s): c7d8ccb

Expand README with lifecycle and validation details

Browse files
Files changed (1) hide show
  1. README.md +183 -5
README.md CHANGED
@@ -1,10 +1,188 @@
1
  ---
2
- title: Sql Debug Env
3
- emoji: 💻
4
- colorFrom: indigo
5
- colorTo: gray
6
  sdk: docker
7
  pinned: false
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: sql-debug-env
3
+ emoji: "🧪"
4
+ colorFrom: blue
5
+ colorTo: green
6
  sdk: docker
7
  pinned: false
8
  ---
9
 
10
+ # SQL Debug Environment (`sql-debug-env`)
11
+
12
+ ![OpenEnv](https://img.shields.io/badge/OpenEnv-Validated-2ea44f)
13
+ ![Docker](https://img.shields.io/badge/Deploy-Docker-2496ED?logo=docker&logoColor=white)
14
+ ![Python](https://img.shields.io/badge/Python-3.11+-3776AB?logo=python&logoColor=white)
15
+ ![FastAPI](https://img.shields.io/badge/FastAPI-0.115-009688?logo=fastapi&logoColor=white)
16
+ ![Pydantic](https://img.shields.io/badge/Pydantic-v2-E92063?logo=pydantic&logoColor=white)
17
+ ![SQLite](https://img.shields.io/badge/SQLite-In--Memory-003B57?logo=sqlite&logoColor=white)
18
+ ![Uvicorn](https://img.shields.io/badge/Uvicorn-ASGI-111111)
19
+ ![OpenAI](https://img.shields.io/badge/OpenAI-Baseline_API-412991?logo=openai&logoColor=white)
20
+
21
+ **Deterministic OpenEnv benchmark for real SQL debugging workflows.**
22
+
23
+ **Quick links:** [Live Space](https://md896-sql-debug-env.hf.space) · [Swagger](https://md896-sql-debug-env.hf.space/docs) · [OpenAPI](https://md896-sql-debug-env.hf.space/openapi.json) · [GitHub](https://github.com/mdayan8/sql-debug-env)
24
+
25
+ An OpenEnv environment for a real engineering workflow: SQL query debugging. Agents iterate on broken SQL using schema/error/sample inspection until they produce the expected result.
26
+
27
+ ## Abstract
28
+ This project implements a deterministic OpenEnv benchmark for SQL debugging. It includes three graded tasks (easy -> medium -> hard), typed action/observation/reward models, dense reward shaping, reproducible behavior, Docker deployment, and a baseline inference runner with strict structured logs.
29
+
30
+ ## Why this matters
31
+ - SQL debugging is a daily task in analytics and backend teams.
32
+ - Deterministic graders allow fair model comparison.
33
+ - Dense reward shaping supports step-by-step agent learning.
34
+ - Fast local runtime enables quick iteration and validation.
35
+
36
+ ## Core Components
37
+ - API layer: `server/main.py`
38
+ - Environment engine: `server/env.py`
39
+ - Episode database: `server/database.py` (in-memory SQLite)
40
+ - Typed models: `server/models.py`
41
+ - Reward logic: `server/reward.py`
42
+ - Task + graders: `server/tasks/`
43
+ - Baseline runner: `inference.py`
44
+
45
+ ## Architecture
46
+ ```mermaid
47
+ flowchart LR
48
+ agent[Agent Or Evaluator] --> api[FastAPI API Layer]
49
+ api --> env[SQLDebugEnv]
50
+ env --> db[InMemory SQLite DB]
51
+ env --> tasks[Task Registry easy medium hard]
52
+ tasks --> grader[Deterministic Grader]
53
+ env --> reward[Reward Engine]
54
+ grader --> reward
55
+ reward --> api
56
+ ```
57
+
58
+ ## API Surface
59
+ - `POST /reset`
60
+ - `POST /step`
61
+ - `GET /state`
62
+ - `GET /tasks`
63
+ - `GET /health`
64
+ - `GET /benchmark`
65
+
66
+ ## API Docs
67
+ - Swagger UI: `http://localhost:7860/docs`
68
+ - ReDoc: `http://localhost:7860/redoc`
69
+ - OpenAPI: `http://localhost:7860/openapi.json`
70
+
71
+ ## Action Space
72
+ | Action | Required fields | Purpose |
73
+ |---|---|---|
74
+ | `submit_query` | `query` | Submit SQL candidate for execution + grading |
75
+ | `inspect_schema` | none | Return schema metadata |
76
+ | `inspect_error` | none | Return last execution error details |
77
+ | `inspect_sample` | `table_name` | Return sample rows from table |
78
+ | `reset_query` | none | Reset current query to original broken query |
79
+
80
+ ## Reward Design
81
+ Reward is clamped to `[0.0, 1.0]` and combines:
82
+ - correctness (`0.0-0.6`)
83
+ - efficiency (`0.0-0.2`)
84
+ - syntax_progress (`0.0-0.1`)
85
+ - schema_bonus (`0.0-0.1`)
86
+ - penalty deduction magnitude (`0.0-0.2`)
87
+
88
+ ## Episode Lifecycle
89
+ 1. Client calls `POST /reset` with optional `task_id`.
90
+ 2. Environment creates a fresh in-memory SQLite DB seeded for that task.
91
+ 3. Client iteratively calls `POST /step` with one action at a time.
92
+ 4. Server returns `(observation, reward, done, info)` each step.
93
+ 5. Episode ends on high grade score or max-step boundary.
94
+
95
+ ## Task Suite
96
+ - Easy: `easy_syntax_fix`
97
+ - Medium: `medium_logic_fix`
98
+ - Hard: `hard_multi_bug`
99
+
100
+ ## Repository Structure
101
+ ```text
102
+ sql-debug-env/
103
+ ├── Dockerfile
104
+ ├── openenv.yaml
105
+ ├── inference.py
106
+ ├── README.md
107
+ ├── requirements.txt
108
+ ├── pyproject.toml
109
+ ├── uv.lock
110
+ ├── scripts/
111
+ │ └── benchmark_local.py
112
+ ├── server/
113
+ │ ├── main.py
114
+ │ ├── env.py
115
+ │ ├── models.py
116
+ │ ├── database.py
117
+ │ ├── reward.py
118
+ │ └── tasks/
119
+ │ ├── base.py
120
+ │ ├── task_easy.py
121
+ │ ├── task_medium.py
122
+ │ └── task_hard.py
123
+ └── tests/
124
+ ├── test_env.py
125
+ ├── test_graders.py
126
+ └── test_reward.py
127
+ ```
128
+
129
+ ## Reliability and Benchmarking
130
+ - `openenv validate --verbose`: PASS
131
+ - `python3 -m unittest discover -s tests -p "test_*.py"`: PASS
132
+ - Docker smoke test: PASS (`/health`, `/tasks`, `/reset`, `/step`)
133
+
134
+ Live benchmark endpoint:
135
+ ```bash
136
+ curl "http://localhost:7860/benchmark?runs=20"
137
+ ```
138
+
139
+ ## Quick Start
140
+ ### Local
141
+ ```bash
142
+ pip install -r requirements.txt
143
+ uvicorn server.main:app --host 0.0.0.0 --port 7860
144
+ ```
145
+
146
+ ### Docker
147
+ ```bash
148
+ docker build -t sql-debug-env .
149
+ docker run -p 7860:7860 sql-debug-env
150
+ ```
151
+
152
+ ### Baseline Inference
153
+ ```bash
154
+ export API_BASE_URL="https://api.openai.com/v1"
155
+ export MODEL_NAME="gpt-4o-mini"
156
+ export OPENAI_API_KEY="your-key"
157
+ export HF_TOKEN="$OPENAI_API_KEY"
158
+ export ENV_BASE_URL="http://localhost:7860"
159
+ export SEED="1"
160
+ python inference.py
161
+ ```
162
+
163
+ ## Required Environment Variables
164
+ | Variable | Required | Purpose |
165
+ |---|---|---|
166
+ | `API_BASE_URL` | Yes (for baseline) | LLM API base endpoint |
167
+ | `MODEL_NAME` | Yes (for baseline) | Model ID for inference |
168
+ | `OPENAI_API_KEY` | Yes (for baseline) | OpenAI client authentication |
169
+ | `HF_TOKEN` | Recommended | Compatibility with evaluator instructions |
170
+ | `ENV_BASE_URL` | Yes (for baseline) | Environment server URL |
171
+ | `SEED` | Optional | Reproducibility control |
172
+
173
+ ## Submission Validation Checklist
174
+ - Space URL is live and `/health` returns `200`.
175
+ - `/reset` returns a valid observation payload.
176
+ - `openenv validate --verbose` passes.
177
+ - Docker build and run succeed locally.
178
+ - `inference.py` is in repo root and emits `[START]`, `[STEP]`, `[END]`.
179
+ - `openenv.yaml` has correct deployed `api.base_url`.
180
+ - `.env` and `.cursor/` are ignored in git.
181
+
182
+ ## Hugging Face Spaces
183
+ Verify deployment:
184
+ ```bash
185
+ curl https://md896-sql-debug-env.hf.space/health
186
+ curl -X POST https://md896-sql-debug-env.hf.space/reset -H "Content-Type: application/json" -d '{}'
187
+ curl https://md896-sql-debug-env.hf.space/docs
188
+ ```