File size: 6,644 Bytes
1c1f833 3ae4edc 1c1f833 3ae4edc 042ca6f 1c1f833 042ca6f 1c1f833 042ca6f 1c1f833 042ca6f 1c1f833 042ca6f 1c1f833 042ca6f 1c1f833 042ca6f 1c1f833 042ca6f 1c1f833 042ca6f 1c1f833 042ca6f 1c1f833 042ca6f 1c1f833 042ca6f 1c1f833 042ca6f 1c1f833 042ca6f 1c1f833 042ca6f 1c1f833 042ca6f 1c1f833 042ca6f 1c1f833 042ca6f 1c1f833 042ca6f d90f877 042ca6f d90f877 042ca6f | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 | ---
language: en
pipeline_tag: text-generation
library_name: transformers
tags:
- text-to-sql
- sql
- postgresql
- qwen2.5
- qlora
- peft
- quantization
base_model: Qwen/Qwen2.5-3B-Instruct
license: mit
metrics:
- accuracy
---
# Qwen2.5-3B Text-to-SQL (PostgreSQL) — Fine-Tuned
## Overview
This repository contains a fine-tuned **Qwen/Qwen2.5-3B-Instruct** model specialized for **Text-to-SQL** generation in **PostgreSQL** for a realistic e-commerce + subscriptions analytics schema.
Artifacts are organized under a single Hub repo using subfolders:
* `fp16/` — merged FP16 model (recommended)
* `int8/` — quantized INT8 checkpoint (smaller footprint)
* `lora_adapter/` — LoRA adapter only (for further tuning / research)
## Intended use
**Use cases**
* Convert natural language questions into PostgreSQL queries.
* Analytical queries over common e-commerce tables (customers, orders, products, subscriptions) plus ML prediction tables (churn/forecast).
**Not for**
* Direct execution on sensitive or production databases without validation (schema checks, allow-lists, sandbox execution).
* Security-critical contexts (SQL injection prevention and access control must be handled outside the model).
## Training summary
| Item | Value |
| --- | --- |
| Base model | Qwen/Qwen2.5-3B-Instruct |
| Fine-tuning method | QLoRA (4-bit) |
| Optimizer | paged\_adamw\_8bit |
| Epochs | 4 |
| Training time | ~4 minutes (A100) |
| Trainable params | 29.9M (1.73% of 3B total) |
| Decoding | Greedy |
| Tracking | MLflow (DagsHub) |
## Evaluation summary (100 test examples)
Primary metric: **parseable PostgreSQL SQL** (validated with `sqlglot`).
Secondary metric: **exact match** (strict string match vs. reference SQL).
| Model | Parseable SQL | Exact match | Mean latency (s) | P50 (s) | P95 (s) |
| --- | --- | --- | --- | --- | --- |
| **qwen\_finetuned\_fp16\_strict** | **1.00** | **0.15** | **0.433** | 0.427 | 0.736 |
| qwen\_finetuned\_int8\_strict | 0.99 | 0.20 | 2.152 | 2.541 | 3.610 |
| qwen\_baseline\_fp16 | 1.00 | 0.09 | 0.405 | 0.422 | 0.624 |
| qwen\_finetuned\_fp16 | 0.93 | 0.13 | 0.527 | 0.711 | 0.739 |
| qwen\_finetuned\_int8 | 0.93 | 0.13 | 2.672 | 3.454 | 3.623 |
| gpt-4o-mini | 1.00 | 0.04 | 1.616 | 1.551 | 2.820 |
| claude-3.5-haiku | 0.99 | 0.07 | 1.735 | 1.541 | 2.697 |
**Key Findings:**
* **Strict prompting is critical**: Adding "Return ONLY the PostgreSQL query. Do NOT include explanations, markdown, or commentary" improved parseable rate from 93% to 100%
* **Fine-tuning improves accuracy**: Exact match increased from 9% (baseline) to 15% (fine-tuned), a **67% improvement**
* **Quantization trade-offs**: INT8 maintains accuracy (20% exact match, best across all models) with 50% memory reduction but shows 5x latency increase
* **Competitive with APIs**: Fine-tuned model achieves **4x better exact match** than GPT-4o-mini while maintaining comparable speed
## Results Visualization

*Parseable SQL rate and exact match accuracy comparison across all 7 models.*
## How to load
### Load the merged FP16 model (recommended)
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
repo_id = "aravula7/qwen-sql-finetuning"
tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="fp16")
model = AutoModelForCausalLM.from_pretrained(
repo_id,
subfolder="fp16",
torch_dtype=torch.float16,
device_map="auto"
)
```
### Load the INT8 model
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
repo_id = "aravula7/qwen-sql-finetuning"
tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="int8")
model = AutoModelForCausalLM.from_pretrained(
repo_id,
subfolder="int8",
device_map="auto"
)
```
### Load base model + LoRA adapter
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
base_id = "Qwen/Qwen2.5-3B-Instruct"
repo_id = "aravula7/qwen-sql-finetuning"
tokenizer = AutoTokenizer.from_pretrained(base_id)
base = AutoModelForCausalLM.from_pretrained(
base_id,
torch_dtype=torch.float16,
device_map="auto"
)
model = PeftModel.from_pretrained(base, repo_id, subfolder="lora_adapter")
```
## Example inference
Below is a minimal example that encourages **SQL-only** output (critical for 100% parseability).
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
repo_id = "aravula7/qwen-sql-finetuning"
tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="fp16")
model = AutoModelForCausalLM.from_pretrained(
repo_id,
subfolder="fp16",
torch_dtype=torch.float16,
device_map="auto"
)
system = "Return ONLY the PostgreSQL query. Do NOT include explanations, markdown, code fences, or commentary."
schema = "Table: customers (customer_id, email, state)\nTable: orders (order_id, customer_id, order_timestamp)"
request = "Show the number of orders per customer in 2025."
prompt = f"""{system}
Schema:
{schema}
Request:
{request}
"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(
**inputs,
max_new_tokens=256,
do_sample=False,
pad_token_id=tokenizer.eos_token_id
)
sql = tokenizer.decode(out[0], skip_special_tokens=True)
# Extract SQL after prompt
sql = sql.split("Request:")[-1].strip()
print(sql)
```
## License
This project is licensed under the MIT License. The fine-tuned model is a derivative of Qwen2.5-3B-Instruct and inherits its license terms.
**Full documentation and code:** [GitHub Repository](https://github.com/aravula7/qwen-sql-finetuning)
## Reproducibility
Training and evaluation were tracked with MLflow on DagsHub. The GitHub repository contains:
* Complete Colab notebook with training and evaluation code
* Dataset (500 examples: 350 train, 50 val, 100 test)
* Visualization scripts for 3D performance analysis
* Production-ready inference code with error handling
**Links:**
* [GitHub Repository](https://github.com/aravula7/qwen-sql-finetuning)
* [MLflow Experiments](https://dagshub.com/aravula7/llm-finetuning)
* [Base Model](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)
## Citation
```bibtex
@misc{qwen-sql-finetuning-2026,
author = {Anirudh Reddy Ravula},
title = {Qwen2.5-3B Text-to-SQL Fine-Tuning for PostgreSQL},
year = {2026},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/aravula7/qwen-sql-finetuning}},
note = {Fine-tuned with QLoRA for e-commerce SQL generation}
}
``` |