qwen-sql-finetuning / README.md
aravula7's picture
Update README.md
d90f877 verified
---
language: en
pipeline_tag: text-generation
library_name: transformers
tags:
- text-to-sql
- sql
- postgresql
- qwen2.5
- qlora
- peft
- quantization
base_model: Qwen/Qwen2.5-3B-Instruct
license: mit
metrics:
- accuracy
---
# Qwen2.5-3B Text-to-SQL (PostgreSQL) — Fine-Tuned
## Overview
This repository contains a fine-tuned **Qwen/Qwen2.5-3B-Instruct** model specialized for **Text-to-SQL** generation in **PostgreSQL** for a realistic e-commerce + subscriptions analytics schema.
Artifacts are organized under a single Hub repo using subfolders:
* `fp16/` — merged FP16 model (recommended)
* `int8/` — quantized INT8 checkpoint (smaller footprint)
* `lora_adapter/` — LoRA adapter only (for further tuning / research)
## Intended use
**Use cases**
* Convert natural language questions into PostgreSQL queries.
* Analytical queries over common e-commerce tables (customers, orders, products, subscriptions) plus ML prediction tables (churn/forecast).
**Not for**
* Direct execution on sensitive or production databases without validation (schema checks, allow-lists, sandbox execution).
* Security-critical contexts (SQL injection prevention and access control must be handled outside the model).
## Training summary
| Item | Value |
| --- | --- |
| Base model | Qwen/Qwen2.5-3B-Instruct |
| Fine-tuning method | QLoRA (4-bit) |
| Optimizer | paged\_adamw\_8bit |
| Epochs | 4 |
| Training time | ~4 minutes (A100) |
| Trainable params | 29.9M (1.73% of 3B total) |
| Decoding | Greedy |
| Tracking | MLflow (DagsHub) |
## Evaluation summary (100 test examples)
Primary metric: **parseable PostgreSQL SQL** (validated with `sqlglot`).
Secondary metric: **exact match** (strict string match vs. reference SQL).
| Model | Parseable SQL | Exact match | Mean latency (s) | P50 (s) | P95 (s) |
| --- | --- | --- | --- | --- | --- |
| **qwen\_finetuned\_fp16\_strict** | **1.00** | **0.15** | **0.433** | 0.427 | 0.736 |
| qwen\_finetuned\_int8\_strict | 0.99 | 0.20 | 2.152 | 2.541 | 3.610 |
| qwen\_baseline\_fp16 | 1.00 | 0.09 | 0.405 | 0.422 | 0.624 |
| qwen\_finetuned\_fp16 | 0.93 | 0.13 | 0.527 | 0.711 | 0.739 |
| qwen\_finetuned\_int8 | 0.93 | 0.13 | 2.672 | 3.454 | 3.623 |
| gpt-4o-mini | 1.00 | 0.04 | 1.616 | 1.551 | 2.820 |
| claude-3.5-haiku | 0.99 | 0.07 | 1.735 | 1.541 | 2.697 |
**Key Findings:**
* **Strict prompting is critical**: Adding "Return ONLY the PostgreSQL query. Do NOT include explanations, markdown, or commentary" improved parseable rate from 93% to 100%
* **Fine-tuning improves accuracy**: Exact match increased from 9% (baseline) to 15% (fine-tuned), a **67% improvement**
* **Quantization trade-offs**: INT8 maintains accuracy (20% exact match, best across all models) with 50% memory reduction but shows 5x latency increase
* **Competitive with APIs**: Fine-tuned model achieves **4x better exact match** than GPT-4o-mini while maintaining comparable speed
## Results Visualization
![Model Comparison](https://github.com/aravula7/qwen-sql-finetuning/raw/main/images/results_comparison.png)
*Parseable SQL rate and exact match accuracy comparison across all 7 models.*
## How to load
### Load the merged FP16 model (recommended)
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
repo_id = "aravula7/qwen-sql-finetuning"
tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="fp16")
model = AutoModelForCausalLM.from_pretrained(
repo_id,
subfolder="fp16",
torch_dtype=torch.float16,
device_map="auto"
)
```
### Load the INT8 model
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
repo_id = "aravula7/qwen-sql-finetuning"
tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="int8")
model = AutoModelForCausalLM.from_pretrained(
repo_id,
subfolder="int8",
device_map="auto"
)
```
### Load base model + LoRA adapter
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
base_id = "Qwen/Qwen2.5-3B-Instruct"
repo_id = "aravula7/qwen-sql-finetuning"
tokenizer = AutoTokenizer.from_pretrained(base_id)
base = AutoModelForCausalLM.from_pretrained(
base_id,
torch_dtype=torch.float16,
device_map="auto"
)
model = PeftModel.from_pretrained(base, repo_id, subfolder="lora_adapter")
```
## Example inference
Below is a minimal example that encourages **SQL-only** output (critical for 100% parseability).
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
repo_id = "aravula7/qwen-sql-finetuning"
tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="fp16")
model = AutoModelForCausalLM.from_pretrained(
repo_id,
subfolder="fp16",
torch_dtype=torch.float16,
device_map="auto"
)
system = "Return ONLY the PostgreSQL query. Do NOT include explanations, markdown, code fences, or commentary."
schema = "Table: customers (customer_id, email, state)\nTable: orders (order_id, customer_id, order_timestamp)"
request = "Show the number of orders per customer in 2025."
prompt = f"""{system}
Schema:
{schema}
Request:
{request}
"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(
**inputs,
max_new_tokens=256,
do_sample=False,
pad_token_id=tokenizer.eos_token_id
)
sql = tokenizer.decode(out[0], skip_special_tokens=True)
# Extract SQL after prompt
sql = sql.split("Request:")[-1].strip()
print(sql)
```
## License
This project is licensed under the MIT License. The fine-tuned model is a derivative of Qwen2.5-3B-Instruct and inherits its license terms.
**Full documentation and code:** [GitHub Repository](https://github.com/aravula7/qwen-sql-finetuning)
## Reproducibility
Training and evaluation were tracked with MLflow on DagsHub. The GitHub repository contains:
* Complete Colab notebook with training and evaluation code
* Dataset (500 examples: 350 train, 50 val, 100 test)
* Visualization scripts for 3D performance analysis
* Production-ready inference code with error handling
**Links:**
* [GitHub Repository](https://github.com/aravula7/qwen-sql-finetuning)
* [MLflow Experiments](https://dagshub.com/aravula7/llm-finetuning)
* [Base Model](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)
## Citation
```bibtex
@misc{qwen-sql-finetuning-2026,
author = {Anirudh Reddy Ravula},
title = {Qwen2.5-3B Text-to-SQL Fine-Tuning for PostgreSQL},
year = {2026},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/aravula7/qwen-sql-finetuning}},
note = {Fine-tuned with QLoRA for e-commerce SQL generation}
}
```