Spaces:
Running
Running
File size: 6,928 Bytes
bc20ef9 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 | # π SQL Debug Env: The Ultimate Master Manual
> **Comprehensive Wiki & Technical Bible for the Meta PyTorch Γ OpenEnv Hackathon**
---
## π Table of Contents
1. [The "Simple" Concept](#1-the-simple-concept)
2. [Architecture: How the Machine Works](#2-architecture-how-the-machine-works)
3. [The Industry Benchmark: Spider vs. BIRD vs. YOU](#3-the-industry-benchmark-spider-vs-bird-vs-you)
4. [Deep-Dive: The Codebase Map](#4-deep-dive-the-codebase-map)
5. [The Science: GRPO & Reinforcement Learning](#5-the-science-grpo--reinforcement-learning)
6. [The "Day in the Life" of a SQL Query](#6-the-day-in-the-life-of-a-sql-query)
7. [Current Project Status & Roadmap](#7-current-project-status--roadmap)
8. [Live Spider Evaluation (The "Ultimate Proof")](#8-live-spider-evaluation-the-ultimate-proof)
9. [Winning the Q&A (The Cheat Sheet)](#9-winning-the-qa-the-cheat-sheet)
---
## 1. The "Simple" Concept
Imagine you are a teacher. You have a student (the **AI**) who is good at English but bad at Math (the **SQL**).
Instead of just giving the student a textbook, you put them in a room with a calculator (the **Database**).
The student tries a problem, uses the calculator, sees the answer is wrong, and tries again.
**You have built the Room, the Calculator, and the Reward System (the "Stars") that makes the student smarter.**
---
## 2. Architecture: How the Machine Works
The project is split into two main "Brains":
### A. The Environment (The Body / server/)
This is the "physical world" where the SQL lives.
- **FastAPI:** The "telephone" that lets the AI talk to the database.
- **SQLite:** The "sandbox" where queries are actually run.
- **Graders:** The "judge" that compares the result of the AI's query to the "truth."
### B. The Agent (The Brain / grpo_train.py)
This is the intelligence that is trying to learn.
- **Model (Qwen2.5-Coder):** The actual neural network.
- **GRPO Logic:** The mathematical formula that tells the model: *"Fix #3 was better than Fix #1, change your weights to be more like #3."*
---
## 3. The Industry Benchmark: Spider vs. BIRD vs. YOU
**Judge Question:** *"Why should we use your environment instead of existing datasets like Spider?"*
| Feature | Spider / BIRD (Standard) | **SQL Debug Env (YOU)** |
| :--- | :--- | :--- |
| **Task Type** | One-Shot Generation | **Iterative Debugging** |
| **Feedback** | None (Static) | **Live Database Feedback** |
| **Difficulty** | High-level Text-to-SQL | **Low-level Logic/Syntax Fixes** |
| **Evaluation** | Fuzzy (String matching) | **Deterministic (Row matching)** |
**The Reference:** Your project is inspired by the **DeepSeek R1** and **OpenAI o1** reasoning models. You are applying their "Reinforcement Learning from Feedback" (RLHF) philosophy to the niche world of SQL engineering.
---
## 4. Deep-Dive: The Codebase Map
| File | What is it? | Why is it here? |
| :--- | :--- | :--- |
| **`server/main.py`** | The Heart | Acts as the API server. It handles `/reset` (new game) and `/step` (make a move). |
| **`server/env.py`** | The World | Manages the session state. It knows if the user is in Task 1 or Task 3. |
| **`server/database.py`** | The Sandbox | Creates temporary SQLite databases in memory so the AI can't break anything. |
| **`server/reward.py`** | The Scorekeeper | Calculates the "Reward" (0.0 to 1.0). It checks syntax, efficiency, and correctness. |
| **`grpo_train.py`** | The Trainer | The script that actually "upgrades" the AI's brain using RL. |
| **`inference.py`** | The Test | A simple script to see how smart the AI is *right now* before training. |
| **`openenv.yaml`** | The ID Card | Tells the hackathon platform how to connect to your project. |
---
## 5. The Science: GRPO & Reinforcement Learning
If a judge asks: *"How does it learn?"*
### The Old Way: SFT (Supervised Fine-Tuning)
- You show the AI 1,000 "Correct" answers.
- **Problem:** The AI just memorizes. It doesn't learn how to "debug" when it sees a new error.
### Your Way: GRPO (Group Relative Policy Optimization)
- **Step 1:** The AI looks at a broken query.
- **Step 2:** It generates **4 different ways** to fix it (a "Group").
- **Step 3:** We run all 4 in the database and get 4 scores.
- **Step 4:** We compare them. We tell the AI: *"Compared to your other 3 tries, your 2nd try was the best. Do more of that."*
- **Innovation:** This is **"Self-Generated Reasoning."** The AI is its own teacher.
---
## 6. The "Day in the Life" of a SQL Query
Follow a query from start to finish:
1. **The Prompt:** "Fix this query: SELECT * FROM userss (typo)."
2. **The Reviewer:** Your `reviewer_check` in `main.py` looks at it. If it sees `DROP TABLE`, it rejects it immediately.
3. **The Sandbox:** The query is run in a private SQLite memory space.
4. **The Comparison:** The system runs the "Correct" query in the background. It compares the rows.
5. **The Reward:** If the rows match, the AI gets `+1.0`. If they don't, but the syntax is valid, it gets `+0.2`.
6. **The Memory:** The AI updates its "Weights" (its digital brain) to remember this success.
---
## 7. Current Project Status & Roadmap
**Project Completion: 95%**
### β
Completed:
- Core FastAPI Server & SQLite Sandbox.
- 3 Realistic SQL Debugging Tasks (Easy, Medium, Hard).
- Multi-Agent Reviewer Layer.
- GRPO Training Script verified on Apple Silicon (M2).
- Smoke Test verified (Handshake is 100% working).
### β³ Remaining (For Hackathon Site):
- Scale to **Qwen 7B/14B** on A100 GPUs.
- Connect **Weights & Biases (WandB)** for the live presentation curve.
---
## 8. Live Spider Evaluation (The "Ultimate Proof")
**How to show the judges your agent can handle real-world academic benchmarks:**
1. **Launch the Spider Task:**
Run `/reset` with the `spider_cross_eval` task ID (handled by `server/tasks/task_spider.py`).
2. **The "Blind Test":**
Ask a judge to pick a random SQL query from the **Spider dev set**.
3. **Introduce a Bug:**
Delete a semicolon, misspell a JOIN, or remove a WHERE clause.
4. **The Demonstration:**
Run `inference.py` on that broken Spider query.
**The Result:** The agent will use its trained GRPO weights to analyze the error, inspect the Spider schema, and return the fix.
**Why this wins:** You are showing that your environment isn't a "closed loop." It can ingest and solve the industry's hardest academic benchmark in real-time.
---
## 9. Winning the Q&A (The Cheat Sheet)
**Q: "Why SQLite?"**
> *"Because it's the world's most used DB. If the agent can reason in SQLite, it can reason in PostgreSQL. I built a 'Simulator' that is DB-agnostic."*
**Q: "What makes this 'Multi-Agent'?"**
> *"I have two roles: The **Fixer** (the LLM) and the **Reviewer** (the guardrail logic). They interact to ensure every query is safe and syntactically sound before execution."*
---
**This manual is your secret weapon. Read it, understand it, and you will own the stage.** π
|