md896 commited on
Commit
c193516
·
1 Parent(s): 00849df

initial commit

Browse files
Files changed (1) hide show
  1. README.md +10 -193
README.md CHANGED
@@ -1,193 +1,10 @@
1
- # SQL Debug Environment (`sql-debug-env`)
2
-
3
- ![Python](https://img.shields.io/badge/Python-3.11+-3776AB?logo=python&logoColor=white)
4
- ![FastAPI](https://img.shields.io/badge/FastAPI-0.115-009688?logo=fastapi&logoColor=white)
5
- ![Pydantic](https://img.shields.io/badge/Pydantic-v2-E92063?logo=pydantic&logoColor=white)
6
- ![SQLite](https://img.shields.io/badge/SQLite-In_Memory-003B57?logo=sqlite&logoColor=white)
7
- ![Docker](https://img.shields.io/badge/Docker-Ready-2496ED?logo=docker&logoColor=white)
8
- ![OpenEnv](https://img.shields.io/badge/OpenEnv-Validated-2ea44f)
9
-
10
- An OpenEnv environment for a real task people do every day: **debugging SQL**. The agent gets a broken query, a live (in-memory) SQLite database, and a description of the expected output. It can inspect schema/errors/samples and submit fixed queries until it solves the task.
11
-
12
- ## What’s in this repo
13
- - **FastAPI server**: `server/main.py` (endpoints: `/health`, `/tasks`, `/reset`, `/step`, `/state`)
14
- - **Environment logic**: `server/env.py` + `server/database.py`
15
- - **Tasks**: `server/tasks/` (easy → medium → hard, deterministic seed data)
16
- - **Baseline agent**: `inference.py` (OpenAI client + `[START]/[STEP]/[END]` logs)
17
-
18
- ## Tech Stack
19
- - Python 3.11+
20
- - FastAPI + Uvicorn
21
- - Pydantic v2
22
- - SQLite (in-memory)
23
- - OpenEnv Core
24
- - Docker
25
- - OpenAI Python SDK (baseline inference)
26
-
27
- ## Production Notes
28
- - Stateless HTTP API with per-session environment instances keyed by `X-Session-Id`
29
- - Deterministic task data (in-memory SQLite) for reproducible grading
30
- - Reward clamped to `[0.0, 1.0]` with partial-progress shaping
31
- - Docker-first deployment path (local and Hugging Face Spaces)
32
- - Local benchmark endpoint for live latency checks (`/benchmark`)
33
-
34
- ## API Docs (FastAPI Auto Docs)
35
- Use these for interactive testing in browser:
36
-
37
- - Swagger UI: `http://localhost:7860/docs`
38
- - ReDoc: `http://localhost:7860/redoc`
39
- - OpenAPI spec: `http://localhost:7860/openapi.json`
40
-
41
- ## Action Space
42
- | Action | Required fields | Cost / reward effect |
43
- |---|---|---|
44
- | `submit_query` | `query` | Main evaluation step (dense reward based on grading) |
45
- | `inspect_schema` | none | Free information action (small positive reward component) |
46
- | `inspect_error` | none | Free information action (small positive reward component) |
47
- | `inspect_sample` | `table_name` | Free information action (small positive reward component) |
48
- | `reset_query` | none | Penalty action (reduces reward for that step) |
49
-
50
- ## Observation Space
51
- | Field | Type |
52
- |---|---|
53
- | `task_id` | `string` |
54
- | `task_description` | `string` |
55
- | `original_query` | `string` |
56
- | `current_query` | `string_or_null` |
57
- | `expected_description` | `string` |
58
- | `last_action_type` | `string` |
59
- | `last_query_result` | `object_or_null` |
60
- | `steps_taken` | `integer` |
61
- | `steps_remaining` | `integer` |
62
- | `current_score` | `float` |
63
- | `schema_info` | `object_or_null` |
64
- | `error_details` | `string_or_null` |
65
- | `sample_rows` | `array_or_null` |
66
- | `hint` | `string_or_null` |
67
- | `is_done` | `boolean` |
68
- | `success` | `boolean` |
69
-
70
- ## Reward Function
71
- | Component | Range | Description |
72
- |---|---|---|
73
- | `correctness` | `[0.0, 0.6]` | Row-level match vs expected output |
74
- | `efficiency` | `[0.0, 0.2]` | Bonus for solving with fewer steps |
75
- | `syntax_progress` | `[0.0, 0.1]` | Small reward for producing syntactically valid SQL |
76
- | `schema_bonus` | `[0.0, 0.1]` | Bonus for referencing correct tables/columns |
77
- | `penalty` | `[0.0, 0.2]` | Deduction magnitude for resets/regressions/urgency near step limit |
78
-
79
- ## Tasks
80
- ### Task 1: Easy — Syntax Error Fix (`easy_syntax_fix`)
81
- Two straightforward issues: a misspelled keyword (`GRUP BY`) and an `ORDER BY` alias mismatch.
82
-
83
- ### Task 2: Medium — Logic Error Fix (`medium_logic_fix`)
84
- Logic bugs around outer joins + filtering scope + aggregation scope.
85
-
86
- ### Task 3: Hard — Multi-Bug Fix (`hard_multi_bug`)
87
- Five bugs across correlated subqueries, window functions, CTE scope, date logic, and duplication.
88
-
89
- ## Baseline
90
- The baseline script is intentionally simple: it loops `reset → step` and asks an OpenAI model to choose the next JSON action.
91
-
92
- ## Reliability & Benchmarking
93
-
94
- ### Verified status (local)
95
- - `openenv validate --verbose`: **PASS**
96
- - `python3 -m unittest discover -s tests -p "test_*.py"`: **10/10 PASS**
97
- - Docker smoke test: **PASS** (`/health`, `/tasks`, `/reset`, `/step`)
98
- - FastAPI docs available: **PASS** (`/docs`, `/redoc`, `/openapi.json`)
99
-
100
- ### Endpoint benchmark (local Docker run, n=25)
101
- Measured with `scripts/benchmark_local.py` on a running local container:
102
-
103
- | Endpoint | avg | p50 | p95 |
104
- |---|---:|---:|---:|
105
- | `GET /health` | 0.69 ms | 0.67 ms | 0.76 ms |
106
- | `GET /tasks` | 0.82 ms | 0.81 ms | 0.90 ms |
107
- | `POST /reset` | 1.34 ms | 1.26 ms | 1.62 ms |
108
- | `POST /step` (`inspect_schema`) | 1.07 ms | 1.01 ms | 1.34 ms |
109
-
110
- Re-run anytime:
111
-
112
- ```bash
113
- python3 scripts/benchmark_local.py
114
- ```
115
-
116
- Notes:
117
- - These are local-machine numbers (single container, warm runtime).
118
- - For submission-grade reporting, also capture one run against your HF Space URL after deploy.
119
-
120
- ## Setup & Usage
121
-
122
- ### Local Development
123
- ```bash
124
- pip install -r requirements.txt
125
- uvicorn server.main:app --host 0.0.0.0 --port 7860
126
- ```
127
-
128
- ### Docker
129
- ```bash
130
- docker build -t sql-debug-env .
131
- docker run -p 7860:7860 sql-debug-env
132
- ```
133
-
134
- ### Quick smoke test
135
- ```bash
136
- curl http://localhost:7860/health
137
- curl http://localhost:7860/tasks
138
- curl -X POST http://localhost:7860/reset -H "Content-Type: application/json" -d '{"task_id":"easy_syntax_fix"}'
139
- curl -X POST http://localhost:7860/step -H "Content-Type: application/json" -d '{"action":{"action_type":"inspect_schema"}}'
140
- curl "http://localhost:7860/benchmark?runs=20"
141
- ```
142
-
143
- ### Real-time benchmark API (for dashboards/web pages)
144
- This is a live endpoint, not static/dummy data. Every request runs fresh measurements.
145
-
146
- - Endpoint: `GET /benchmark?runs=20`
147
- - `runs` range: `1` to `100`
148
- - Returns JSON with `avg_ms`, `p50_ms`, `p95_ms`, `n`, and a fresh `timestamp_epoch_ms`
149
-
150
- Example:
151
- ```bash
152
- curl "http://localhost:7860/benchmark?runs=30"
153
- ```
154
-
155
- ### Run Baseline
156
- ```bash
157
- export API_BASE_URL="https://api.openai.com/v1"
158
- export MODEL_NAME="gpt-4o-mini"
159
- export OPENAI_API_KEY="your-key"
160
- export ENV_BASE_URL="http://localhost:7860"
161
- export HF_TOKEN="$OPENAI_API_KEY"
162
- export SEED="1"
163
- python inference.py
164
- ```
165
-
166
- ### OpenEnv Validation
167
- ```bash
168
- pip install openenv-core
169
- openenv validate
170
- ```
171
-
172
- ### Suggested pre-submit check
173
- ```bash
174
- openenv validate --verbose
175
- python3 -m unittest discover -s tests -p "test_*.py"
176
- docker build -t sql-debug-env .
177
- docker run --rm -p 7860:7860 sql-debug-env
178
- # in another terminal:
179
- curl -s http://localhost:7860/health
180
- curl -s http://localhost:7860/docs >/dev/null
181
- curl -s "http://localhost:7860/benchmark?runs=20"
182
- ```
183
-
184
- ## Hugging Face Spaces (Docker)
185
- 1. Create a new **Space → Docker**.
186
- 2. Push this repo.
187
- 3. Update `openenv.yaml` → `api.base_url` to your Space URL: `https://<your-space>.hf.space`
188
- 4. Wait for build, then verify:
189
-
190
- ```bash
191
- curl -X POST https://<your-space>.hf.space/reset -H "Content-Type: application/json" -d '{}'
192
- ```
193
-
 
1
+ ---
2
+ title: Sql Debug Env
3
+ emoji: 💻
4
+ colorFrom: indigo
5
+ colorTo: gray
6
+ sdk: docker
7
+ pinned: false
8
+ ---
9
+
10
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference