gourav03003 commited on
Commit
c93a989
Β·
1 Parent(s): 3898ed7

fix: add HF Space metadata to README

Browse files
Files changed (1) hide show
  1. README.md +18 -106
README.md CHANGED
@@ -1,110 +1,22 @@
1
- # SQL Query Debugger β€” OpenEnv Environment
 
 
 
 
 
 
 
2
 
3
- An RL environment where an AI agent debugs broken SQL queries.
4
- Given a faulty query, database schema, error message, and sample rows,
5
- the agent must produce a corrected query that executes successfully
6
- and returns the expected result set.
7
 
8
- ## Environment Description
9
-
10
- Real-world motivation: Every data engineer, analyst, and backend developer
11
- debugs SQL queries daily. This environment trains agents to identify and fix
12
- common SQL mistakes β€” from simple typos to complex multi-table logic errors.
13
-
14
- ## Action Space
15
-
16
- | Field | Type | Description |
17
- |---|---|---|
18
- | `fixed_query` | string | The corrected SQL query to execute |
19
-
20
- ## Observation Space
21
-
22
- | Field | Type | Description |
23
- |---|---|---|
24
- | `broken_query` | string | The SQL query containing errors |
25
- | `schema` | string | CREATE TABLE statements |
26
- | `error_message` | string | Error from executing the broken query |
27
- | `sample_rows` | string | Sample data as JSON string |
28
- | `expected_output_hint` | string | Natural language description of correct output |
29
- | `task_id` | string | Difficulty level of current task |
30
- | `attempts_remaining` | integer | Fix attempts left in episode |
31
- | `last_result` | string | Result rows from last query attempt |
32
 
33
  ## Tasks
34
-
35
- ### Task 1 β€” Syntax Fix (Easy)
36
- Fix SQL syntax errors: misspelled keywords (SELCT, WERE, GRUP),
37
- missing commas, wrong keyword order.
38
- - Reward: F1 score between returned rows and expected rows
39
- - Expected agent score: 0.7 β€” 1.0
40
-
41
- ### Task 2 β€” Logic Bug Fix (Medium)
42
- Fix SQL logic errors: wrong GROUP BY column, incorrect WHERE condition,
43
- wrong ORDER BY direction, misused LIMIT.
44
- - Reward: F1 score between returned rows and expected rows
45
- - Expected agent score: 0.4 β€” 0.8
46
-
47
- ### Task 3 β€” Multi-Table Optimization (Hard)
48
- Fix complex multi-table queries: wrong JOIN conditions, missing GROUP BY,
49
- incorrect self-joins, subquery errors, cartesian products.
50
- - Reward: F1 score between returned rows and expected rows
51
- - Expected agent score: 0.2 β€” 0.6
52
-
53
- ## Reward Function
54
-
55
- Each step returns a reward between 0.0 and 1.0:
56
- - Base reward = F1 score between agent query output and expected result set
57
- - Early solve bonus = up to 0.1 extra for solving in fewer steps
58
- - Score of 0.0 = query crashes or returns completely wrong rows
59
- - Score of 1.0 = query returns exactly the expected result set
60
-
61
- ## Setup Instructions
62
-
63
- ### Local setup
64
- ```bash
65
- git clone https://github.com/sharmagourav687526-sketch/sql-query-debugger.git
66
- cd sql-query-debugger
67
- pip install openenv-core fastapi uvicorn pydantic
68
- ```
69
-
70
- ### Run the server locally
71
- ```bash
72
- cd sql-query-debugger
73
- uvicorn server.app:app --host 0.0.0.0 --port 8000
74
- ```
75
-
76
- ### Run the baseline inference script
77
- ```bash
78
- export API_BASE_URL=https://router.huggingface.co/v1
79
- export MODEL_NAME=Qwen/Qwen2.5-72B-Instruct
80
- export HF_TOKEN=your_token_here
81
- python inference.py
82
- ```
83
-
84
- ### Docker
85
- ```bash
86
- docker build -t sql-query-debugger -f server/Dockerfile .
87
- docker run -p 8000:8000 sql-query-debugger
88
- ```
89
-
90
- ### Validate
91
- ```bash
92
- openenv validate
93
- ```
94
-
95
- ## Baseline Scores
96
-
97
- | Task | Difficulty | Baseline Score |
98
- |---|---|---|
99
- | syntax_fix | Easy | 0.72 |
100
- | logic_bug | Medium | 0.51 |
101
- | multi_table | Hard | 0.34 |
102
- | **Average** | | **0.52** |
103
-
104
- ## Environment Details
105
-
106
- - 20 pre-built scenarios across 3 difficulty levels
107
- - Grader: SQLite execution + F1 score vs expected result set
108
- - Max steps per episode: 5
109
- - Scores always in range 0.0 β€” 1.0
110
- - Fully deterministic graders β€” no randomness in scoring
 
1
+ ---
2
+ title: SQL Query Debugger
3
+ emoji: πŸ—„οΈ
4
+ colorFrom: blue
5
+ colorTo: green
6
+ sdk: docker
7
+ pinned: false
8
+ ---
9
 
10
+ # SQL Query Debugger
 
 
 
11
 
12
+ An OpenEnv environment for training agents to debug SQL queries.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
  ## Tasks
15
+ - **syntax_fix** (Easy): Fix typos in SQL keywords
16
+ - **logic_bug** (Medium): Fix wrong logic in queries
17
+ - **multi_table** (Hard): Fix wrong JOINs and subqueries
18
+
19
+ ## API
20
+ - `POST /reset` β€” Start new episode
21
+ - `POST /step` β€” Submit fixed query
22
+ - `GET /state` β€” Get current state