| --- |
| title: Phi-3 Mini SQL Generator |
| emoji: 🧠 |
| colorFrom: yellow |
| colorTo: purple |
| sdk: gradio |
| sdk_version: 6.5.1 |
| app_file: app.py |
| pinned: false |
| license: apache-2.0 |
| short_description: "SQL generator powered by Phi-3 Mini fine-tuning" |
| --- |
| |
| # Phi-3 Mini SQL Generator |
|
|
| A text-to-SQL app built around a QLoRA fine-tuned Phi-3 Mini model for converting natural-language questions to SQL queries. |
|
|
| ## What the App Does |
|
|
| The app accepts a table schema plus a natural-language SQL question. It uses deterministic code for schema creation/editing and high-confidence guardrail templates, then calls the fine-tuned model only for supported text-to-SQL generation. |
|
|
| The base model is not live-loaded in the Space. It is kept as offline evaluation evidence so the CPU demo does not pretend to run two 3.8B models interactively. |
|
|
| ## Model-Generated vs Deterministic |
|
|
| | Behavior | Source | |
| | --- | --- | |
| | Non-template `SQL_QUERY` after schema selection | Fine-tuned Phi-3 Mini | |
| | High-confidence ranking/count/aggregation queries | Deterministic SQL template | |
| | `CREATE_TABLE` from explicit user columns | Deterministic schema parser | |
| | `EDIT_TABLE` add/remove/rename columns | Deterministic schema parser | |
| | Greetings, off-topic requests, unknown intents | Static fallback | |
|
|
| Boundary rule: deterministic SQL, schema parsers, translations, and fallbacks are not model capability. They are product guardrails around the model. |
|
|
| ## Training and Evaluation Story |
|
|
| - **Base model:** [`microsoft/Phi-3-mini-4k-instruct`](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) |
| - **Dataset:** `b-mc2/sql-create-context`, sampled to 1,000 training examples. |
| - **Training method:** QLoRA adapter fine-tuning, then merged for simpler CPU inference. |
| - **Published merged model:** [`Shizu0n/phi3-mini-sql-generator-merged`](https://huggingface.co/Shizu0n/phi3-mini-sql-generator-merged) |
| - **Primary metric:** exact match on held-out text-to-SQL examples. |
|
|
| | Model | Exact match | |
| | --- | ---: | |
| | Base Phi-3 Mini | 2.0% | |
| | Fine-tuned QLoRA merged | 73.5% | |
|
|
| Reported gain: **+71.5 percentage points** over the base model. |
|
|
| Known limits: |
|
|
| - The training data is mostly `SELECT`; `INSERT`, `UPDATE`, and `DELETE` are not a reliable model capability. |
| - Unsupported DML requests are not generated by the model and must not be disguised as schema edits. For example, `delete species` can remove a schema column, but `delete all animals` returns a static unsupported-scope fallback. |
| - PT-BR input is not trained model capability. The app handles selected PT-BR cases through deterministic normalization/templates. |
| - Conversational chat and JSON schema proposal are out of scope and use fallback behavior instead of model prompts. |
| - Exact match does not prove semantic perfection. It is useful evidence, not a substitute for qualitative review. |
|
|
| ## AI Engineering Story |
|
|
| The product layer exists because a fine-tuned model is not the same thing as a reliable app. |
|
|
| - Intent routing limits the model path to `SQL_QUERY`. |
| - Deterministic schema tools handle explicit create/edit requests without loading the model. |
| - High-confidence SQL templates handle simple ranking, aggregation, count, and comparison queries before the CPU model path. |
| - SQL output is validated with `sqlparse`; model output is also checked against the active `CREATE TABLE` schema before it can be shown as accepted model SQL. |
| - Lazy loading keeps startup cheap; the model is downloaded and loaded only when needed. |
| - Load and generation timeouts protect the UI from indefinite waits. |
| - Static fallbacks make unsupported behavior visible instead of laundering it as AI. |
| - Every user-visible response/status/error with a known source must preserve provenance: fine-tuned model, deterministic SQL template, deterministic schema parser, or static fallback. |
|
|
| ## How to Use |
|
|
| 1. Click **Load fine-tuned model**. |
| - Loading is lazy: the model is only downloaded and loaded when you request it. |
| - On CPU, the first load can take a few minutes. |
| 2. Select, create, or edit the **SQL table schema**. |
| - You can use the presets: `employees`, `orders`, `students`, `products`, `sales`. |
| - You can also ask for explicit schema operations, such as `create table products with id name price` or `add stock`. |
| 3. Enter the question in the chat input. |
| 4. Click **Send**. |
| 5. Review the result in `gr.Code(language="sql")` and the source/status message. |
| - The app shows a validation badge powered by `sqlparse` plus active-schema checks for model output. |
| - Known-source errors should still identify their source path; unknown-source errors should not fake certainty. |
|
|
| ## Usage Examples |
|
|
| Use this path to show both sides of the project without muddying provenance. |
|
|
| 1. **Model path:** load the fine-tuned model, select `employees`, and ask `show employees in Engineering ordered by salary`. |
| 2. **Template path:** ask `qual o produto mais caro?` with the `products` schema — returns deterministic SQL. |
| 3. **Schema parser:** ask `create table animals with id name species weight`, then `add habitat` — deterministic schema manipulation. |
| 4. **Fallback:** ask `what can you do?` — static response since chat is outside model training. |
| 5. **Evaluation:** point to the base-vs-fine-tuned exact-match numbers. |
|
|
| ## Models |
|
|
| - Fine-tuned adapter: [Shizu0n/phi3-mini-sql-generator](https://huggingface.co/Shizu0n/phi3-mini-sql-generator) |
| - Fine-tuned merged model used in the app: [Shizu0n/phi3-mini-sql-generator-merged](https://huggingface.co/Shizu0n/phi3-mini-sql-generator-merged) |
| - Offline baseline model used for evaluation: [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) |
|
|
| ## Current Features |
|
|
| - Gradio UI with a step-by-step flow: load the fine-tuned model, define schema context, and inspect SQL artifacts. |
| - Intent routing with 5 supported routes: `CREATE_TABLE`, `EDIT_TABLE`, `SQL_QUERY`, `SMALLTALK`, `UNKNOWN`. |
| - Model calls only for non-template `SQL_QUERY`; deterministic templates may answer supported SQL shapes before model load. |
| - Smalltalk and unknown messages use a static fallback. |
| - Lazy loading to reduce startup cost. |
| - Preserved Phi-3 patches for local/Spaces compatibility. |
| - Schema presets without blocking manual input. |
| - SQL output separated from errors/status so booleans, integers, and error messages do not appear inside the SQL block. |
| - Centered loading overlay to make the loading state obvious. |
|
|
| ## Model Probe |
|
|
| The normal pytest suite does not load the 3.8B model. To manually verify the real model behavior: |
|
|
| ```bash |
| python scripts/model_probe.py |
| ``` |
|
|
| The probe prints JSON with pass/fail checks for static fallback, deterministic CREATE TABLE, deterministic schema edit, SQL query generation, and smalltalk while a schema is active. |
|
|
| ## Tests |
|
|
| ```bash |
| set PYTHONPATH=. && pytest tests/test_chatbot_core.py tests/test_chatbot_behavior.py -q |
| ``` |
|
|
| Current unit suite: **141 tests**. These tests avoid loading the 3.8B model and focus on routing, deterministic tools, prompt construction, model-output rejection, active-schema validation, SQL validation, UI schema-context synchronization, and error handling. |
|
|
| ## Run Locally |
|
|
| ```bash |
| python -m pip install -r requirements.txt |
| python app.py |
| ``` |
|
|
| Then open the address shown by Gradio, usually `http://127.0.0.1:7860`. |
|
|