Spaces:

Shizu0n
/

phi3-mini-sql-generator-demo

Sleeping

App Files Files Community

phi3-mini-sql-generator-demo / README.md

Shizu0n

feat: double-quoted comparison literals for schema validation

35d7c39 18 days ago

preview code

raw

history blame contribute delete

7.28 kB

	---
	title: Phi-3 Mini SQL Generator
	emoji: 🧠
	colorFrom: yellow
	colorTo: purple
	sdk: gradio
	sdk_version: 6.5.1
	app_file: app.py
	pinned: false
	license: apache-2.0
	short_description: "SQL generator powered by Phi-3 Mini fine-tuning"
	---

	# Phi-3 Mini SQL Generator

	A text-to-SQL app built around a QLoRA fine-tuned Phi-3 Mini model for converting natural-language questions to SQL queries.

	## What the App Does

	The app accepts a table schema plus a natural-language SQL question. It uses deterministic code for schema creation/editing and high-confidence guardrail templates, then calls the fine-tuned model only for supported text-to-SQL generation.

	The base model is not live-loaded in the Space. It is kept as offline evaluation evidence so the CPU demo does not pretend to run two 3.8B models interactively.

	## Model-Generated vs Deterministic

	\| Behavior \| Source \|
	\| --- \| --- \|
	\| Non-template `SQL_QUERY` after schema selection \| Fine-tuned Phi-3 Mini \|
	\| High-confidence ranking/count/aggregation queries \| Deterministic SQL template \|
	\| `CREATE_TABLE` from explicit user columns \| Deterministic schema parser \|
	\| `EDIT_TABLE` add/remove/rename columns \| Deterministic schema parser \|
	\| Greetings, off-topic requests, unknown intents \| Static fallback \|

	Boundary rule: deterministic SQL, schema parsers, translations, and fallbacks are not model capability. They are product guardrails around the model.

	## Training and Evaluation Story

	- Base model: [`microsoft/Phi-3-mini-4k-instruct`](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct)
	- Dataset: `b-mc2/sql-create-context`, sampled to 1,000 training examples.
	- Training method: QLoRA adapter fine-tuning, then merged for simpler CPU inference.
	- Published merged model: [`Shizu0n/phi3-mini-sql-generator-merged`](https://huggingface.co/Shizu0n/phi3-mini-sql-generator-merged)
	- Primary metric: exact match on held-out text-to-SQL examples.

	\| Model \| Exact match \|
	\| --- \| ---: \|
	\| Base Phi-3 Mini \| 2.0% \|
	\| Fine-tuned QLoRA merged \| 73.5% \|

	Reported gain: +71.5 percentage points over the base model.

	Known limits:

	- The training data is mostly `SELECT`; `INSERT`, `UPDATE`, and `DELETE` are not a reliable model capability.
	- Unsupported DML requests are not generated by the model and must not be disguised as schema edits. For example, `delete species` can remove a schema column, but `delete all animals` returns a static unsupported-scope fallback.
	- PT-BR input is not trained model capability. The app handles selected PT-BR cases through deterministic normalization/templates.
	- Conversational chat and JSON schema proposal are out of scope and use fallback behavior instead of model prompts.
	- Exact match does not prove semantic perfection. It is useful evidence, not a substitute for qualitative review.

	## AI Engineering Story

	The product layer exists because a fine-tuned model is not the same thing as a reliable app.

	- Intent routing limits the model path to `SQL_QUERY`.
	- Deterministic schema tools handle explicit create/edit requests without loading the model.
	- High-confidence SQL templates handle simple ranking, aggregation, count, and comparison queries before the CPU model path.
	- SQL output is validated with `sqlparse`; model output is also checked against the active `CREATE TABLE` schema before it can be shown as accepted model SQL.
	- Lazy loading keeps startup cheap; the model is downloaded and loaded only when needed.
	- Load and generation timeouts protect the UI from indefinite waits.
	- Static fallbacks make unsupported behavior visible instead of laundering it as AI.
	- Every user-visible response/status/error with a known source must preserve provenance: fine-tuned model, deterministic SQL template, deterministic schema parser, or static fallback.

	## How to Use

	1. Click Load fine-tuned model.
	- Loading is lazy: the model is only downloaded and loaded when you request it.
	- On CPU, the first load can take a few minutes.
	2. Select, create, or edit the SQL table schema.
	- You can use the presets: `employees`, `orders`, `students`, `products`, `sales`.
	- You can also ask for explicit schema operations, such as `create table products with id name price` or `add stock`.
	3. Enter the question in the chat input.
	4. Click Send.
	5. Review the result in `gr.Code(language="sql")` and the source/status message.
	- The app shows a validation badge powered by `sqlparse` plus active-schema checks for model output.
	- Known-source errors should still identify their source path; unknown-source errors should not fake certainty.

	## Usage Examples

	Use this path to show both sides of the project without muddying provenance.

	1. Model path: load the fine-tuned model, select `employees`, and ask `show employees in Engineering ordered by salary`.
	2. Template path: ask `qual o produto mais caro?` with the `products` schema — returns deterministic SQL.
	3. Schema parser: ask `create table animals with id name species weight`, then `add habitat` — deterministic schema manipulation.
	4. Fallback: ask `what can you do?` — static response since chat is outside model training.
	5. Evaluation: point to the base-vs-fine-tuned exact-match numbers.

	## Models

	- Fine-tuned adapter: [Shizu0n/phi3-mini-sql-generator](https://huggingface.co/Shizu0n/phi3-mini-sql-generator)
	- Fine-tuned merged model used in the app: [Shizu0n/phi3-mini-sql-generator-merged](https://huggingface.co/Shizu0n/phi3-mini-sql-generator-merged)
	- Offline baseline model used for evaluation: [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct)

	## Current Features

	- Gradio UI with a step-by-step flow: load the fine-tuned model, define schema context, and inspect SQL artifacts.
	- Intent routing with 5 supported routes: `CREATE_TABLE`, `EDIT_TABLE`, `SQL_QUERY`, `SMALLTALK`, `UNKNOWN`.
	- Model calls only for non-template `SQL_QUERY`; deterministic templates may answer supported SQL shapes before model load.
	- Smalltalk and unknown messages use a static fallback.
	- Lazy loading to reduce startup cost.
	- Preserved Phi-3 patches for local/Spaces compatibility.
	- Schema presets without blocking manual input.
	- SQL output separated from errors/status so booleans, integers, and error messages do not appear inside the SQL block.
	- Centered loading overlay to make the loading state obvious.

	## Model Probe

	The normal pytest suite does not load the 3.8B model. To manually verify the real model behavior:

	```bash
	python scripts/model_probe.py
	```

	The probe prints JSON with pass/fail checks for static fallback, deterministic CREATE TABLE, deterministic schema edit, SQL query generation, and smalltalk while a schema is active.

	## Tests

	```bash
	set PYTHONPATH=. && pytest tests/test_chatbot_core.py tests/test_chatbot_behavior.py -q
	```

	Current unit suite: 141 tests. These tests avoid loading the 3.8B model and focus on routing, deterministic tools, prompt construction, model-output rejection, active-schema validation, SQL validation, UI schema-context synchronization, and error handling.

	## Run Locally

	```bash
	python -m pip install -r requirements.txt
	python app.py
	```

	Then open the address shown by Gradio, usually `http://127.0.0.1:7860`.