# 🏆 SQL Debug Env: The Ultimate Master Manual > **Comprehensive Wiki & Technical Bible for the Meta PyTorch × OpenEnv Hackathon** --- ## 📖 Table of Contents 1. [The "Simple" Concept](#1-the-simple-concept) 2. [Architecture: How the Machine Works](#2-architecture-how-the-machine-works) 3. [The Industry Benchmark: Spider vs. BIRD vs. YOU](#3-the-industry-benchmark-spider-vs-bird-vs-you) 4. [Deep-Dive: The Codebase Map](#4-deep-dive-the-codebase-map) 5. [The Science: GRPO & Reinforcement Learning](#5-the-science-grpo--reinforcement-learning) 6. [The "Day in the Life" of a SQL Query](#6-the-day-in-the-life-of-a-sql-query) 7. [Current Project Status & Roadmap](#7-current-project-status--roadmap) 8. [Live Spider Evaluation (The "Ultimate Proof")](#8-live-spider-evaluation-the-ultimate-proof) 9. [Winning the Q&A (The Cheat Sheet)](#9-winning-the-qa-the-cheat-sheet) --- ## 1. The "Simple" Concept Imagine you are a teacher. You have a student (the **AI**) who is good at English but bad at Math (the **SQL**). Instead of just giving the student a textbook, you put them in a room with a calculator (the **Database**). The student tries a problem, uses the calculator, sees the answer is wrong, and tries again. **You have built the Room, the Calculator, and the Reward System (the "Stars") that makes the student smarter.** --- ## 2. Architecture: How the Machine Works The project is split into two main "Brains": ### A. The Environment (The Body / server/) This is the "physical world" where the SQL lives. - **FastAPI:** The "telephone" that lets the AI talk to the database. - **SQLite:** The "sandbox" where queries are actually run. - **Graders:** The "judge" that compares the result of the AI's query to the "truth." ### B. The Agent (The Brain / grpo_train.py) This is the intelligence that is trying to learn. - **Model (Qwen2.5-Coder):** The actual neural network. - **GRPO Logic:** The mathematical formula that tells the model: *"Fix #3 was better than Fix #1, change your weights to be more like #3."* --- ## 3. The Industry Benchmark: Spider vs. BIRD vs. YOU **Judge Question:** *"Why should we use your environment instead of existing datasets like Spider?"* | Feature | Spider / BIRD (Standard) | **SQL Debug Env (YOU)** | | :--- | :--- | :--- | | **Task Type** | One-Shot Generation | **Iterative Debugging** | | **Feedback** | None (Static) | **Live Database Feedback** | | **Difficulty** | High-level Text-to-SQL | **Low-level Logic/Syntax Fixes** | | **Evaluation** | Fuzzy (String matching) | **Deterministic (Row matching)** | **The Reference:** Your project is inspired by the **DeepSeek R1** and **OpenAI o1** reasoning models. You are applying their "Reinforcement Learning from Feedback" (RLHF) philosophy to the niche world of SQL engineering. --- ## 4. Deep-Dive: The Codebase Map | File | What is it? | Why is it here? | | :--- | :--- | :--- | | **`server/main.py`** | The Heart | Acts as the API server. It handles `/reset` (new game) and `/step` (make a move). | | **`server/env.py`** | The World | Manages the session state. It knows if the user is in Task 1 or Task 3. | | **`server/database.py`** | The Sandbox | Creates temporary SQLite databases in memory so the AI can't break anything. | | **`server/reward.py`** | The Scorekeeper | Calculates the "Reward" (0.0 to 1.0). It checks syntax, efficiency, and correctness. | | **`grpo_train.py`** | The Trainer | The script that actually "upgrades" the AI's brain using RL. | | **`inference.py`** | The Test | A simple script to see how smart the AI is *right now* before training. | | **`openenv.yaml`** | The ID Card | Tells the hackathon platform how to connect to your project. | --- ## 5. The Science: GRPO & Reinforcement Learning If a judge asks: *"How does it learn?"* ### The Old Way: SFT (Supervised Fine-Tuning) - You show the AI 1,000 "Correct" answers. - **Problem:** The AI just memorizes. It doesn't learn how to "debug" when it sees a new error. ### Your Way: GRPO (Group Relative Policy Optimization) - **Step 1:** The AI looks at a broken query. - **Step 2:** It generates **4 different ways** to fix it (a "Group"). - **Step 3:** We run all 4 in the database and get 4 scores. - **Step 4:** We compare them. We tell the AI: *"Compared to your other 3 tries, your 2nd try was the best. Do more of that."* - **Innovation:** This is **"Self-Generated Reasoning."** The AI is its own teacher. --- ## 6. The "Day in the Life" of a SQL Query Follow a query from start to finish: 1. **The Prompt:** "Fix this query: SELECT * FROM userss (typo)." 2. **The Reviewer:** Your `reviewer_check` in `main.py` looks at it. If it sees `DROP TABLE`, it rejects it immediately. 3. **The Sandbox:** The query is run in a private SQLite memory space. 4. **The Comparison:** The system runs the "Correct" query in the background. It compares the rows. 5. **The Reward:** If the rows match, the AI gets `+1.0`. If they don't, but the syntax is valid, it gets `+0.2`. 6. **The Memory:** The AI updates its "Weights" (its digital brain) to remember this success. --- ## 7. Current Project Status & Roadmap **Project Completion: 95%** ### ✅ Completed: - Core FastAPI Server & SQLite Sandbox. - 3 Realistic SQL Debugging Tasks (Easy, Medium, Hard). - Multi-Agent Reviewer Layer. - GRPO Training Script verified on Apple Silicon (M2). - Smoke Test verified (Handshake is 100% working). ### ⏳ Remaining (For Hackathon Site): - Scale to **Qwen 7B/14B** on A100 GPUs. - Connect **Weights & Biases (WandB)** for the live presentation curve. --- ## 8. Live Spider Evaluation (The "Ultimate Proof") **How to show the judges your agent can handle real-world academic benchmarks:** 1. **Launch the Spider Task:** Run `/reset` with the `spider_cross_eval` task ID (handled by `server/tasks/task_spider.py`). 2. **The "Blind Test":** Ask a judge to pick a random SQL query from the **Spider dev set**. 3. **Introduce a Bug:** Delete a semicolon, misspell a JOIN, or remove a WHERE clause. 4. **The Demonstration:** Run `inference.py` on that broken Spider query. **The Result:** The agent will use its trained GRPO weights to analyze the error, inspect the Spider schema, and return the fix. **Why this wins:** You are showing that your environment isn't a "closed loop." It can ingest and solve the industry's hardest academic benchmark in real-time. --- ## 9. Winning the Q&A (The Cheat Sheet) **Q: "Why SQLite?"** > *"Because it's the world's most used DB. If the agent can reason in SQLite, it can reason in PostgreSQL. I built a 'Simulator' that is DB-agnostic."* **Q: "What makes this 'Multi-Agent'?"** > *"I have two roles: The **Fixer** (the LLM) and the **Reviewer** (the guardrail logic). They interact to ensure every query is safe and syntactically sound before execution."* --- **This manual is your secret weapon. Read it, understand it, and you will own the stage.** 🚀