Spaces:

Shizu0n
/

phi3-mini-sql-generator-demo

Sleeping

App Files Files Community

phi3-mini-sql-generator-demo / README.md

Shizu0n

feat: double-quoted comparison literals for schema validation

35d7c39 18 days ago

preview code

raw

history blame contribute delete

7.28 kB

A newer version of the Gradio SDK is available: 6.15.2

Upgrade

metadata

title: Phi-3 Mini SQL Generator
emoji: 🧠
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: 6.5.1
app_file: app.py
pinned: false
license: apache-2.0
short_description: SQL generator powered by Phi-3 Mini fine-tuning

Phi-3 Mini SQL Generator

A text-to-SQL app built around a QLoRA fine-tuned Phi-3 Mini model for converting natural-language questions to SQL queries.

What the App Does

The app accepts a table schema plus a natural-language SQL question. It uses deterministic code for schema creation/editing and high-confidence guardrail templates, then calls the fine-tuned model only for supported text-to-SQL generation.

The base model is not live-loaded in the Space. It is kept as offline evaluation evidence so the CPU demo does not pretend to run two 3.8B models interactively.

Model-Generated vs Deterministic

Behavior	Source
Non-template `SQL_QUERY` after schema selection	Fine-tuned Phi-3 Mini
High-confidence ranking/count/aggregation queries	Deterministic SQL template
`CREATE_TABLE` from explicit user columns	Deterministic schema parser
`EDIT_TABLE` add/remove/rename columns	Deterministic schema parser
Greetings, off-topic requests, unknown intents	Static fallback

Boundary rule: deterministic SQL, schema parsers, translations, and fallbacks are not model capability. They are product guardrails around the model.

Training and Evaluation Story

Base model: microsoft/Phi-3-mini-4k-instruct
Dataset: b-mc2/sql-create-context, sampled to 1,000 training examples.
Training method: QLoRA adapter fine-tuning, then merged for simpler CPU inference.
Published merged model: Shizu0n/phi3-mini-sql-generator-merged
Primary metric: exact match on held-out text-to-SQL examples.

Model	Exact match
Base Phi-3 Mini	2.0%
Fine-tuned QLoRA merged	73.5%

Reported gain: +71.5 percentage points over the base model.

Known limits:

The training data is mostly SELECT; INSERT, UPDATE, and DELETE are not a reliable model capability.
Unsupported DML requests are not generated by the model and must not be disguised as schema edits. For example, delete species can remove a schema column, but delete all animals returns a static unsupported-scope fallback.
PT-BR input is not trained model capability. The app handles selected PT-BR cases through deterministic normalization/templates.
Conversational chat and JSON schema proposal are out of scope and use fallback behavior instead of model prompts.
Exact match does not prove semantic perfection. It is useful evidence, not a substitute for qualitative review.

AI Engineering Story

The product layer exists because a fine-tuned model is not the same thing as a reliable app.

Intent routing limits the model path to SQL_QUERY.
Deterministic schema tools handle explicit create/edit requests without loading the model.
High-confidence SQL templates handle simple ranking, aggregation, count, and comparison queries before the CPU model path.
SQL output is validated with sqlparse; model output is also checked against the active CREATE TABLE schema before it can be shown as accepted model SQL.
Lazy loading keeps startup cheap; the model is downloaded and loaded only when needed.
Load and generation timeouts protect the UI from indefinite waits.
Static fallbacks make unsupported behavior visible instead of laundering it as AI.
Every user-visible response/status/error with a known source must preserve provenance: fine-tuned model, deterministic SQL template, deterministic schema parser, or static fallback.

How to Use

Click Load fine-tuned model.
- Loading is lazy: the model is only downloaded and loaded when you request it.
- On CPU, the first load can take a few minutes.
Select, create, or edit the SQL table schema.
- You can use the presets: employees, orders, students, products, sales.
- You can also ask for explicit schema operations, such as create table products with id name price or add stock.
Enter the question in the chat input.
Click Send.
Review the result in gr.Code(language="sql") and the source/status message.
- The app shows a validation badge powered by sqlparse plus active-schema checks for model output.
- Known-source errors should still identify their source path; unknown-source errors should not fake certainty.

Usage Examples

Use this path to show both sides of the project without muddying provenance.

Model path: load the fine-tuned model, select employees, and ask show employees in Engineering ordered by salary.
Template path: ask qual o produto mais caro? with the products schema — returns deterministic SQL.
Schema parser: ask create table animals with id name species weight, then add habitat — deterministic schema manipulation.
Fallback: ask what can you do? — static response since chat is outside model training.
Evaluation: point to the base-vs-fine-tuned exact-match numbers.

Models

Fine-tuned adapter: Shizu0n/phi3-mini-sql-generator
Fine-tuned merged model used in the app: Shizu0n/phi3-mini-sql-generator-merged
Offline baseline model used for evaluation: microsoft/Phi-3-mini-4k-instruct

Current Features

Gradio UI with a step-by-step flow: load the fine-tuned model, define schema context, and inspect SQL artifacts.
Intent routing with 5 supported routes: CREATE_TABLE, EDIT_TABLE, SQL_QUERY, SMALLTALK, UNKNOWN.
Model calls only for non-template SQL_QUERY; deterministic templates may answer supported SQL shapes before model load.
Smalltalk and unknown messages use a static fallback.
Lazy loading to reduce startup cost.
Preserved Phi-3 patches for local/Spaces compatibility.
Schema presets without blocking manual input.
SQL output separated from errors/status so booleans, integers, and error messages do not appear inside the SQL block.
Centered loading overlay to make the loading state obvious.

Model Probe

The normal pytest suite does not load the 3.8B model. To manually verify the real model behavior:

python scripts/model_probe.py

The probe prints JSON with pass/fail checks for static fallback, deterministic CREATE TABLE, deterministic schema edit, SQL query generation, and smalltalk while a schema is active.

Tests

set PYTHONPATH=. && pytest tests/test_chatbot_core.py tests/test_chatbot_behavior.py -q

Current unit suite: 141 tests. These tests avoid loading the 3.8B model and focus on routing, deterministic tools, prompt construction, model-output rejection, active-schema validation, SQL validation, UI schema-context synchronization, and error handling.

Run Locally

python -m pip install -r requirements.txt
python app.py

Then open the address shown by Gradio, usually http://127.0.0.1:7860.