File size: 7,194 Bytes
b9659fa
75907ae
b9659fa
 
 
 
0376765
b9659fa
 
5706fdf
b9659fa
 
269f632
 
 
 
 
75907ae
 
 
 
 
 
 
 
 
 
 
 
 
 
269f632
 
75907ae
 
 
 
 
269f632
 
 
 
 
 
 
 
75907ae
269f632
 
 
75907ae
 
269f632
 
 
75907ae
 
 
 
269f632
 
 
 
 
 
 
 
75907ae
 
 
 
 
 
 
 
 
 
 
 
 
269f632
 
 
 
 
 
 
 
 
 
 
 
 
 
75907ae
 
 
269f632
 
 
75907ae
 
269f632
75907ae
 
 
 
 
269f632
75907ae
 
 
 
 
 
 
 
269f632
 
 
75907ae
269f632
 
75907ae
269f632
75907ae
 
 
 
 
 
 
 
 
 
269f632
75907ae
 
 
 
 
 
 
 
 
 
 
 
 
269f632
75907ae
269f632
75907ae
 
 
269f632
 
 
 
75907ae
 
 
 
 
 
 
269f632
 
 
 
 
 
 
75907ae
 
 
 
 
269f632
 
 
75907ae
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
---
title: SQL Repair Env
emoji: πŸ“Š
colorFrom: red
colorTo: purple
sdk: docker
app_port: 8000
pinned: false
license: apache-2.0
short_description: OpenEnv SQL repair tasks with a live frontend
---

# SQL Repair OpenEnv

An OpenEnv environment for the **Meta PyTorch x Scaler hackathon** where
agents repair broken SQL queries against a small SQLite schema.

## Live Links

- GitHub: [Krishpotanwar/sql-repair-env](https://github.com/Krishpotanwar/sql-repair-env)
- Hugging Face Space: [krishpotanwar/sql-repair-env](https://huggingface.co/spaces/krishpotanwar/sql-repair-env)
- Live app: [krishpotanwar-sql-repair-env.hf.space](https://krishpotanwar-sql-repair-env.hf.space/)

## What This Repo Includes

- A FastAPI OpenEnv backend with `/health`, `/tasks`, `/reset`, `/step`, `/grader`, and `/baseline`
- `/api/*` aliases so the frontend can call the same environment through the Hugging Face Space root
- A frontend command center inspired by the `disasterman` UI shell, adapted for SQL query repair workflows
- Three deterministic SQL repair tasks with strict scoring in `(0, 1)`
- A validator-safe inference runner that always emits bounded scores even on missing-key or crash paths

## Tasks

| ID       | Difficulty | What's broken                            |
|----------|------------|------------------------------------------|
| `task_1` | easy       | SELECT list missing commas               |
| `task_2` | medium     | JOIN references columns that don't exist |
| `task_3` | hard       | Aggregate query missing GROUP BY         |

Each task gives the agent the schema, the broken query, the runtime error
(if any), and a one-line hint. The agent submits a corrected query via the
`/step` endpoint and is scored on whether the result rows match the
canonical expected rows.

## Architecture

```text
.
β”œβ”€β”€ pyproject.toml         # uv project, server entry point
β”œβ”€β”€ uv.lock                # uv lockfile
β”œβ”€β”€ Dockerfile             # builds the FastAPI server image
β”œβ”€β”€ README.md              # GitHub + HF Space landing document
β”œβ”€β”€ inference.py           # AGENT β€” talks to the env via HTTP, calls an LLM
β”œβ”€β”€ openenv.yaml           # OpenEnv metadata
β”œβ”€β”€ server/
β”‚   └── app.py             # FastAPI env server + frontend serving
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ src/               # Winner UI shell + SQL-specific tabs
β”‚   └── dist/              # prebuilt frontend bundle shipped to HF
β”œβ”€β”€ sql_env/
β”‚   β”œβ”€β”€ env_core.py        # SQLite-backed env state
β”‚   β”œβ”€β”€ tasks.py           # Task definitions
β”‚   └── grader.py          # Strict (0, 1) score clamping
└── tests/
    └── test_smoke.py      # Pytest smoke suite
```

## Frontend

The root route `/` serves a deployed frontend command center with:

- `QUERY LAB` for reset, submit, and grader workflows
- `TASK ATLAS` for browsing the three SQL missions
- `BASELINE ARENA` for broken-query score comparisons
- `AGENT PROTOCOL` for validator and runtime notes
- `API OPS` for backend connection and endpoint visibility

In production, the frontend talks to the backend through `/api/*` aliases.
That keeps the Space root usable as both a human-facing app and an API host.

## HTTP API

| Method | Path        | Body                                      | Returns                              |
|--------|-------------|-------------------------------------------|--------------------------------------|
| GET    | `/health`   | β€”                                         | `{"status":"ok"}`                    |
| GET    | `/tasks`    | β€”                                         | task list + metadata                 |
| POST   | `/reset`    | `{"task_id":"task_1"}` (optional)         | observation                          |
| POST   | `/step`     | `{"action":{"action_type":"submit_query","query":"..."}}` | observation/reward/done |
| POST   | `/grader`   | `{"task_id":"task_1"}`                    | `{"score": float in (0,1)}`          |
| POST   | `/baseline` | `{"tasks":[...]}` (optional)              | scores for all tasks                 |

`/reset` accepts an empty body and defaults to `task_1` β€” required by the
OpenEnv validator.

The frontend calls the equivalent `/api/health`, `/api/tasks`, `/api/reset`,
`/api/step`, `/api/grader`, and `/api/baseline` aliases.

## Running locally

```bash
# 1. Install backend dependencies
uv sync

# 2. Build the frontend bundle used by the HF Space
cd frontend
npm ci
npm run build
cd ..

# 3. Start the env server
python -m server.app          # serves API + frontend at http://localhost:8000

# 4. Optional: frontend-only dev mode
cd frontend
npm run dev

# 5. Run the agent (in another terminal)
export HF_TOKEN=<your-groq-or-openai-key>
export API_BASE_URL=https://api.groq.com/openai/v1
export MODEL_NAME=llama-3.3-70b-versatile
ENV_URL=http://localhost:8000 uv run python inference.py
```

## Example API Flow

```bash
URL=https://krishpotanwar-sql-repair-env.hf.space

curl -s "$URL/health"
curl -s -X POST "$URL/reset" -H "Content-Type: application/json" -d '{}'
curl -s -X POST "$URL/step" \
  -H "Content-Type: application/json" \
  -d '{"action":{"action_type":"submit_query","query":"SELECT id, name, price FROM products ORDER BY id"}}'
curl -s -X POST "$URL/grader" -H "Content-Type: application/json" -d '{"task_id":"task_1"}'
curl -s -X POST "$URL/baseline" -H "Content-Type: application/json" -d '{}'
```

## Inference Output Notes

`inference.py` is designed to keep stdout validator-safe:

- Every task emits exactly one `[START]` line
- Every task emits exactly one `[END]` line with `score=` strictly in `(0, 1)`
- On missing-key or fatal fallback paths, the score falls back to `0.5000`
- Successful runs may also emit `[STEP]` progress lines

Example no-key path:

```text
[START] task_1
[END] task_1 | score=0.5000 | status=fatal_no_llm
[START] task_2
[END] task_2 | score=0.5000 | status=fatal_no_llm
[START] task_3
[END] task_3 | score=0.5000 | status=fatal_no_llm
```

## Environment variables

| Name               | Default                                  | Notes                                       |
|--------------------|------------------------------------------|---------------------------------------------|
| `API_BASE_URL`     | `https://api.groq.com/openai/v1`         | Required by OpenEnv submission checklist    |
| `MODEL_NAME`       | `llama-3.3-70b-versatile`                | Required by OpenEnv submission checklist    |
| `HF_TOKEN`         | (none β€” must be set in HF Space Secrets) | Required by OpenEnv submission checklist    |
| `LOCAL_IMAGE_NAME` | (unset)                                  | If set, inference.py boots a Docker image   |
| `ENV_URL`          | `http://localhost:8000`                  | Where the env server is reachable           |

## Validation

```bash
# Phase 1 β€” official OpenEnv validator
uvx --from openenv-core openenv validate .

# Backend smoke tests
uv run pytest tests/ -q

# Frontend lint
cd frontend && npm run lint
```

No API keys are hardcoded in this repo. The agent reads `HF_TOKEN` (with
optional `GROQ_API_KEY` and `OPENAI_API_KEY` fallbacks) at runtime only.