md896 commited on
Commit
87464f9
·
1 Parent(s): e7c61ad

Restore full README content under HF metadata

Browse files
Files changed (1) hide show
  1. README.md +153 -1
README.md CHANGED
@@ -7,4 +7,156 @@ sdk: docker
7
  pinned: false
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  pinned: false
8
  ---
9
 
10
+ # SQL Debug Environment (`sql-debug-env`)
11
+
12
+ ![OpenEnv](https://img.shields.io/badge/OpenEnv-Validated-2ea44f)
13
+ ![Docker](https://img.shields.io/badge/Deploy-Docker-2496ED?logo=docker&logoColor=white)
14
+ ![Python](https://img.shields.io/badge/Python-3.11+-3776AB?logo=python&logoColor=white)
15
+ ![FastAPI](https://img.shields.io/badge/FastAPI-0.115-009688?logo=fastapi&logoColor=white)
16
+ ![Pydantic](https://img.shields.io/badge/Pydantic-v2-E92063?logo=pydantic&logoColor=white)
17
+ ![SQLite](https://img.shields.io/badge/SQLite-In--Memory-003B57?logo=sqlite&logoColor=white)
18
+ ![Uvicorn](https://img.shields.io/badge/Uvicorn-ASGI-111111)
19
+ ![OpenAI](https://img.shields.io/badge/OpenAI-Baseline_API-412991?logo=openai&logoColor=white)
20
+
21
+ **Deterministic OpenEnv benchmark for real SQL debugging workflows.**
22
+
23
+ **Quick links:** [Live Space](https://md896-sql-debug-env.hf.space) · [Swagger](https://md896-sql-debug-env.hf.space/docs) · [OpenAPI](https://md896-sql-debug-env.hf.space/openapi.json) · [GitHub](https://github.com/mdayan8/sql-debug-env)
24
+
25
+ An OpenEnv environment for a real engineering workflow: SQL query debugging. Agents iterate on broken SQL using schema/error/sample inspection until they produce the expected result.
26
+
27
+ ## Abstract
28
+ This project implements a deterministic OpenEnv benchmark for SQL debugging. It includes three graded tasks (easy -> medium -> hard), typed action/observation/reward models, dense reward shaping, reproducible behavior, Docker deployment, and a baseline inference runner with strict structured logs.
29
+
30
+ ## Why this matters
31
+ - SQL debugging is a daily task in analytics and backend teams.
32
+ - Deterministic graders allow fair model comparison.
33
+ - Dense reward shaping supports step-by-step agent learning.
34
+ - Fast local runtime enables quick iteration and validation.
35
+
36
+ ## Core Components
37
+ - API layer: `server/main.py`
38
+ - Environment engine: `server/env.py`
39
+ - Episode database: `server/database.py` (in-memory SQLite)
40
+ - Typed models: `server/models.py`
41
+ - Reward logic: `server/reward.py`
42
+ - Task + graders: `server/tasks/`
43
+ - Baseline runner: `inference.py`
44
+
45
+ ## Architecture
46
+ ```mermaid
47
+ flowchart LR
48
+ agent[Agent Or Evaluator] --> api[FastAPI API Layer]
49
+ api --> env[SQLDebugEnv]
50
+ env --> db[InMemory SQLite DB]
51
+ env --> tasks[Task Registry easy medium hard]
52
+ tasks --> grader[Deterministic Grader]
53
+ env --> reward[Reward Engine]
54
+ grader --> reward
55
+ reward --> api
56
+ ```
57
+
58
+ ## API Surface
59
+ - `POST /reset`
60
+ - `POST /step`
61
+ - `GET /state`
62
+ - `GET /tasks`
63
+ - `GET /health`
64
+ - `GET /benchmark`
65
+
66
+ ## API Docs
67
+ - Swagger UI: `http://localhost:7860/docs`
68
+ - ReDoc: `http://localhost:7860/redoc`
69
+ - OpenAPI: `http://localhost:7860/openapi.json`
70
+
71
+ ## Action Space
72
+ | Action | Required fields | Purpose |
73
+ |---|---|---|
74
+ | `submit_query` | `query` | Submit SQL candidate for execution + grading |
75
+ | `inspect_schema` | none | Return schema metadata |
76
+ | `inspect_error` | none | Return last execution error details |
77
+ | `inspect_sample` | `table_name` | Return sample rows from table |
78
+ | `reset_query` | none | Reset current query to original broken query |
79
+
80
+ ## Reward Design
81
+ Reward is clamped to `[0.0, 1.0]` and combines:
82
+ - correctness (`0.0-0.6`)
83
+ - efficiency (`0.0-0.2`)
84
+ - syntax_progress (`0.0-0.1`)
85
+ - schema_bonus (`0.0-0.1`)
86
+ - penalty deduction magnitude (`0.0-0.2`)
87
+
88
+ ## Task Suite
89
+ - Easy: `easy_syntax_fix`
90
+ - Medium: `medium_logic_fix`
91
+ - Hard: `hard_multi_bug`
92
+
93
+ ## Repository Structure
94
+ ```text
95
+ sql-debug-env/
96
+ ├── Dockerfile
97
+ ├── openenv.yaml
98
+ ├── inference.py
99
+ ├── README.md
100
+ ├── requirements.txt
101
+ ├── pyproject.toml
102
+ ├── uv.lock
103
+ ├── scripts/
104
+ │ └── benchmark_local.py
105
+ ├── server/
106
+ │ ├── main.py
107
+ │ ├── env.py
108
+ │ ├── models.py
109
+ │ ├── database.py
110
+ │ ├── reward.py
111
+ │ └── tasks/
112
+ │ ├── base.py
113
+ │ ├── task_easy.py
114
+ │ ├── task_medium.py
115
+ │ └── task_hard.py
116
+ └── tests/
117
+ ├── test_env.py
118
+ ├── test_graders.py
119
+ └── test_reward.py
120
+ ```
121
+
122
+ ## Reliability and Benchmarking
123
+ - `openenv validate --verbose`: PASS
124
+ - `python3 -m unittest discover -s tests -p "test_*.py"`: PASS
125
+ - Docker smoke test: PASS (`/health`, `/tasks`, `/reset`, `/step`)
126
+
127
+ Live benchmark endpoint:
128
+ ```bash
129
+ curl "http://localhost:7860/benchmark?runs=20"
130
+ ```
131
+
132
+ ## Quick Start
133
+ ### Local
134
+ ```bash
135
+ pip install -r requirements.txt
136
+ uvicorn server.main:app --host 0.0.0.0 --port 7860
137
+ ```
138
+
139
+ ### Docker
140
+ ```bash
141
+ docker build -t sql-debug-env .
142
+ docker run -p 7860:7860 sql-debug-env
143
+ ```
144
+
145
+ ### Baseline Inference
146
+ ```bash
147
+ export API_BASE_URL="https://api.openai.com/v1"
148
+ export MODEL_NAME="gpt-4o-mini"
149
+ export OPENAI_API_KEY="your-key"
150
+ export HF_TOKEN="$OPENAI_API_KEY"
151
+ export ENV_BASE_URL="http://localhost:7860"
152
+ export SEED="1"
153
+ python inference.py
154
+ ```
155
+
156
+ ## Hugging Face Spaces
157
+ Verify deployment:
158
+ ```bash
159
+ curl https://md896-sql-debug-env.hf.space/health
160
+ curl -X POST https://md896-sql-debug-env.hf.space/reset -H "Content-Type: application/json" -d '{}'
161
+ curl https://md896-sql-debug-env.hf.space/docs
162
+ ```