sql-repair-env / README.md
krishpotanwar's picture
docs: shorten hf summary
5706fdf
---
title: SQL Repair Env
emoji: πŸ“Š
colorFrom: red
colorTo: purple
sdk: docker
app_port: 8000
pinned: false
license: apache-2.0
short_description: OpenEnv SQL repair tasks with a live frontend
---
# SQL Repair OpenEnv
An OpenEnv environment for the **Meta PyTorch x Scaler hackathon** where
agents repair broken SQL queries against a small SQLite schema.
## Live Links
- GitHub: [Krishpotanwar/sql-repair-env](https://github.com/Krishpotanwar/sql-repair-env)
- Hugging Face Space: [krishpotanwar/sql-repair-env](https://huggingface.co/spaces/krishpotanwar/sql-repair-env)
- Live app: [krishpotanwar-sql-repair-env.hf.space](https://krishpotanwar-sql-repair-env.hf.space/)
## What This Repo Includes
- A FastAPI OpenEnv backend with `/health`, `/tasks`, `/reset`, `/step`, `/grader`, and `/baseline`
- `/api/*` aliases so the frontend can call the same environment through the Hugging Face Space root
- A frontend command center inspired by the `disasterman` UI shell, adapted for SQL query repair workflows
- Three deterministic SQL repair tasks with strict scoring in `(0, 1)`
- A validator-safe inference runner that always emits bounded scores even on missing-key or crash paths
## Tasks
| ID | Difficulty | What's broken |
|----------|------------|------------------------------------------|
| `task_1` | easy | SELECT list missing commas |
| `task_2` | medium | JOIN references columns that don't exist |
| `task_3` | hard | Aggregate query missing GROUP BY |
Each task gives the agent the schema, the broken query, the runtime error
(if any), and a one-line hint. The agent submits a corrected query via the
`/step` endpoint and is scored on whether the result rows match the
canonical expected rows.
## Architecture
```text
.
β”œβ”€β”€ pyproject.toml # uv project, server entry point
β”œβ”€β”€ uv.lock # uv lockfile
β”œβ”€β”€ Dockerfile # builds the FastAPI server image
β”œβ”€β”€ README.md # GitHub + HF Space landing document
β”œβ”€β”€ inference.py # AGENT β€” talks to the env via HTTP, calls an LLM
β”œβ”€β”€ openenv.yaml # OpenEnv metadata
β”œβ”€β”€ server/
β”‚ └── app.py # FastAPI env server + frontend serving
β”œβ”€β”€ frontend/
β”‚ β”œβ”€β”€ src/ # Winner UI shell + SQL-specific tabs
β”‚ └── dist/ # prebuilt frontend bundle shipped to HF
β”œβ”€β”€ sql_env/
β”‚ β”œβ”€β”€ env_core.py # SQLite-backed env state
β”‚ β”œβ”€β”€ tasks.py # Task definitions
β”‚ └── grader.py # Strict (0, 1) score clamping
└── tests/
└── test_smoke.py # Pytest smoke suite
```
## Frontend
The root route `/` serves a deployed frontend command center with:
- `QUERY LAB` for reset, submit, and grader workflows
- `TASK ATLAS` for browsing the three SQL missions
- `BASELINE ARENA` for broken-query score comparisons
- `AGENT PROTOCOL` for validator and runtime notes
- `API OPS` for backend connection and endpoint visibility
In production, the frontend talks to the backend through `/api/*` aliases.
That keeps the Space root usable as both a human-facing app and an API host.
## HTTP API
| Method | Path | Body | Returns |
|--------|-------------|-------------------------------------------|--------------------------------------|
| GET | `/health` | β€” | `{"status":"ok"}` |
| GET | `/tasks` | β€” | task list + metadata |
| POST | `/reset` | `{"task_id":"task_1"}` (optional) | observation |
| POST | `/step` | `{"action":{"action_type":"submit_query","query":"..."}}` | observation/reward/done |
| POST | `/grader` | `{"task_id":"task_1"}` | `{"score": float in (0,1)}` |
| POST | `/baseline` | `{"tasks":[...]}` (optional) | scores for all tasks |
`/reset` accepts an empty body and defaults to `task_1` β€” required by the
OpenEnv validator.
The frontend calls the equivalent `/api/health`, `/api/tasks`, `/api/reset`,
`/api/step`, `/api/grader`, and `/api/baseline` aliases.
## Running locally
```bash
# 1. Install backend dependencies
uv sync
# 2. Build the frontend bundle used by the HF Space
cd frontend
npm ci
npm run build
cd ..
# 3. Start the env server
python -m server.app # serves API + frontend at http://localhost:8000
# 4. Optional: frontend-only dev mode
cd frontend
npm run dev
# 5. Run the agent (in another terminal)
export HF_TOKEN=<your-groq-or-openai-key>
export API_BASE_URL=https://api.groq.com/openai/v1
export MODEL_NAME=llama-3.3-70b-versatile
ENV_URL=http://localhost:8000 uv run python inference.py
```
## Example API Flow
```bash
URL=https://krishpotanwar-sql-repair-env.hf.space
curl -s "$URL/health"
curl -s -X POST "$URL/reset" -H "Content-Type: application/json" -d '{}'
curl -s -X POST "$URL/step" \
-H "Content-Type: application/json" \
-d '{"action":{"action_type":"submit_query","query":"SELECT id, name, price FROM products ORDER BY id"}}'
curl -s -X POST "$URL/grader" -H "Content-Type: application/json" -d '{"task_id":"task_1"}'
curl -s -X POST "$URL/baseline" -H "Content-Type: application/json" -d '{}'
```
## Inference Output Notes
`inference.py` is designed to keep stdout validator-safe:
- Every task emits exactly one `[START]` line
- Every task emits exactly one `[END]` line with `score=` strictly in `(0, 1)`
- On missing-key or fatal fallback paths, the score falls back to `0.5000`
- Successful runs may also emit `[STEP]` progress lines
Example no-key path:
```text
[START] task_1
[END] task_1 | score=0.5000 | status=fatal_no_llm
[START] task_2
[END] task_2 | score=0.5000 | status=fatal_no_llm
[START] task_3
[END] task_3 | score=0.5000 | status=fatal_no_llm
```
## Environment variables
| Name | Default | Notes |
|--------------------|------------------------------------------|---------------------------------------------|
| `API_BASE_URL` | `https://api.groq.com/openai/v1` | Required by OpenEnv submission checklist |
| `MODEL_NAME` | `llama-3.3-70b-versatile` | Required by OpenEnv submission checklist |
| `HF_TOKEN` | (none β€” must be set in HF Space Secrets) | Required by OpenEnv submission checklist |
| `LOCAL_IMAGE_NAME` | (unset) | If set, inference.py boots a Docker image |
| `ENV_URL` | `http://localhost:8000` | Where the env server is reachable |
## Validation
```bash
# Phase 1 β€” official OpenEnv validator
uvx --from openenv-core openenv validate .
# Backend smoke tests
uv run pytest tests/ -q
# Frontend lint
cd frontend && npm run lint
```
No API keys are hardcoded in this repo. The agent reads `HF_TOKEN` (with
optional `GROQ_API_KEY` and `OPENAI_API_KEY` fallbacks) at runtime only.