File size: 2,876 Bytes
99aa2be
 
 
 
 
 
 
 
 
72805b8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
99aa2be
 
 
72805b8
99aa2be
72805b8
 
 
99aa2be
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
---

title: SQL Arena
emoji: 🏟️
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
---


# SQL Arena - OpenEnv Environment

An interactive SQL query challenge environment where AI agents learn to write SQL
by iteratively querying databases and receiving execution feedback with partial credit scoring.

## Real-World Utility

Text-to-SQL is one of the most valuable capabilities for AI agents:
- Used by data analysts, business users, and developers daily
- Evaluates reasoning, schema understanding, and query composition
- Directly applicable to production AI assistants and copilots

## Tasks

| Task | Difficulty | Description | Max Steps |
|------|-----------|-------------|-----------|
| basic_select | Easy | SELECT, WHERE, ORDER BY | 5 |

| join_aggregate | Medium | JOINs, GROUP BY, HAVING | 7 |
| complex_analysis | Hard | CTEs, window functions | 10 |



Each difficulty has 3 unique problems with deterministic grading.



## Action Space



The agent sends a SQL query each step:



{"sql_query": "SELECT name, salary FROM employees WHERE salary > 80000"}

## Observation Space

The agent receives back:

- schema_description: Database schema text

- question: Natural language question to answer

- query_result: Result table from last query
- error_message: Error if query failed

- feedback: Scoring feedback with hints

- expected_columns: Expected column names
- attempts_remaining: Steps left

- difficulty: Task difficulty level

- task_id: Problem identifier

## Reward Function (0.0 to 1.0)

| Component | Weight | Description |
|-----------|--------|-------------|
| Execution | 0.10 | Query runs without error |
| Columns | 0.20 | Correct column names |
| Row Count | 0.20 | Correct number of rows |
| Values | 0.50 | Correct data values |

## Setup

pip install -r requirements.txt

## Run Server

uvicorn src.sql_arena.server:app --host 0.0.0.0 --port 7860



## Run Inference



set HF_TOKEN=your_token

python inference.py



## Docker



docker build -t sql-arena .

docker run -p 7860:7860 sql-arena



## Run Tests



pytest tests/ -v



## Project Structure



sql_arena/
- openenv.yaml (Environment metadata)
- Dockerfile (Container deployment)
- inference.py (Baseline inference script)
- src/sql_arena/

  - models.py (Typed Pydantic models)

  - environment.py (Core environment logic)

  - tasks.py (9 SQL challenges)

  - graders.py (Partial credit scoring)

  - database.py (SQLite management)

  - server.py (FastAPI server)

- tests/

  - test_env.py (Test suite)

## API Endpoints

| Method | Endpoint | Description |
|--------|----------|-------------|
| POST | /reset | Start new episode |
| POST | /step | Submit SQL query |
| GET | /state | Get current state |
| GET | /tasks | List available tasks |
| WS | /ws | WebSocket sessions |

## License

MIT