File size: 6,644 Bytes
1c1f833
 
 
 
 
3ae4edc
 
 
 
 
 
 
1c1f833
3ae4edc
042ca6f
 
1c1f833
 
 
 
 
 
 
 
 
 
042ca6f
 
 
1c1f833
 
 
 
 
042ca6f
 
1c1f833
 
 
042ca6f
 
1c1f833
 
 
 
042ca6f
1c1f833
 
042ca6f
1c1f833
042ca6f
 
1c1f833
 
 
 
 
 
 
 
 
042ca6f
 
 
 
 
 
1c1f833
 
 
042ca6f
 
 
 
 
 
 
 
 
 
 
 
1c1f833
 
 
 
 
 
 
 
 
 
 
042ca6f
 
 
 
 
 
1c1f833
 
 
 
 
 
 
 
 
 
042ca6f
 
 
 
 
1c1f833
 
 
 
 
 
 
042ca6f
1c1f833
 
 
 
 
042ca6f
 
 
 
 
1c1f833
 
 
 
 
 
042ca6f
1c1f833
 
 
 
 
 
 
042ca6f
 
 
 
 
 
1c1f833
 
 
 
 
 
 
 
 
 
 
 
 
 
042ca6f
1c1f833
042ca6f
 
 
 
 
 
 
 
 
 
 
1c1f833
 
 
 
042ca6f
 
 
1c1f833
 
 
042ca6f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d90f877
042ca6f
 
d90f877
042ca6f
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
---
language: en
pipeline_tag: text-generation
library_name: transformers
tags:
- text-to-sql
- sql
- postgresql
- qwen2.5
- qlora
- peft
- quantization
base_model: Qwen/Qwen2.5-3B-Instruct
license: mit
metrics:
- accuracy
---

# Qwen2.5-3B Text-to-SQL (PostgreSQL) — Fine-Tuned

## Overview

This repository contains a fine-tuned **Qwen/Qwen2.5-3B-Instruct** model specialized for **Text-to-SQL** generation in **PostgreSQL** for a realistic e-commerce + subscriptions analytics schema.

Artifacts are organized under a single Hub repo using subfolders:

* `fp16/` — merged FP16 model (recommended)
* `int8/` — quantized INT8 checkpoint (smaller footprint)
* `lora_adapter/` — LoRA adapter only (for further tuning / research)

## Intended use

**Use cases**

* Convert natural language questions into PostgreSQL queries.
* Analytical queries over common e-commerce tables (customers, orders, products, subscriptions) plus ML prediction tables (churn/forecast).

**Not for**

* Direct execution on sensitive or production databases without validation (schema checks, allow-lists, sandbox execution).
* Security-critical contexts (SQL injection prevention and access control must be handled outside the model).

## Training summary

| Item | Value |
| --- | --- |
| Base model | Qwen/Qwen2.5-3B-Instruct |
| Fine-tuning method | QLoRA (4-bit) |
| Optimizer | paged\_adamw\_8bit |
| Epochs | 4 |
| Training time | ~4 minutes (A100) |
| Trainable params | 29.9M (1.73% of 3B total) |
| Decoding | Greedy |
| Tracking | MLflow (DagsHub) |

## Evaluation summary (100 test examples)

Primary metric: **parseable PostgreSQL SQL** (validated with `sqlglot`).  
Secondary metric: **exact match** (strict string match vs. reference SQL).

| Model | Parseable SQL | Exact match | Mean latency (s) | P50 (s) | P95 (s) |
| --- | --- | --- | --- | --- | --- |
| **qwen\_finetuned\_fp16\_strict** | **1.00** | **0.15** | **0.433** | 0.427 | 0.736 |
| qwen\_finetuned\_int8\_strict | 0.99 | 0.20 | 2.152 | 2.541 | 3.610 |
| qwen\_baseline\_fp16 | 1.00 | 0.09 | 0.405 | 0.422 | 0.624 |
| qwen\_finetuned\_fp16 | 0.93 | 0.13 | 0.527 | 0.711 | 0.739 |
| qwen\_finetuned\_int8 | 0.93 | 0.13 | 2.672 | 3.454 | 3.623 |
| gpt-4o-mini | 1.00 | 0.04 | 1.616 | 1.551 | 2.820 |
| claude-3.5-haiku | 0.99 | 0.07 | 1.735 | 1.541 | 2.697 |

**Key Findings:**

* **Strict prompting is critical**: Adding "Return ONLY the PostgreSQL query. Do NOT include explanations, markdown, or commentary" improved parseable rate from 93% to 100%
* **Fine-tuning improves accuracy**: Exact match increased from 9% (baseline) to 15% (fine-tuned), a **67% improvement**
* **Quantization trade-offs**: INT8 maintains accuracy (20% exact match, best across all models) with 50% memory reduction but shows 5x latency increase
* **Competitive with APIs**: Fine-tuned model achieves **4x better exact match** than GPT-4o-mini while maintaining comparable speed

## Results Visualization

![Model Comparison](https://github.com/aravula7/qwen-sql-finetuning/raw/main/images/results_comparison.png)

*Parseable SQL rate and exact match accuracy comparison across all 7 models.*

## How to load

### Load the merged FP16 model (recommended)

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

repo_id = "aravula7/qwen-sql-finetuning"

tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="fp16")
model = AutoModelForCausalLM.from_pretrained(
    repo_id, 
    subfolder="fp16",
    torch_dtype=torch.float16,
    device_map="auto"
)
```

### Load the INT8 model

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

repo_id = "aravula7/qwen-sql-finetuning"

tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="int8")
model = AutoModelForCausalLM.from_pretrained(
    repo_id, 
    subfolder="int8",
    device_map="auto"
)
```

### Load base model + LoRA adapter

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base_id = "Qwen/Qwen2.5-3B-Instruct"
repo_id = "aravula7/qwen-sql-finetuning"

tokenizer = AutoTokenizer.from_pretrained(base_id)
base = AutoModelForCausalLM.from_pretrained(
    base_id,
    torch_dtype=torch.float16,
    device_map="auto"
)

model = PeftModel.from_pretrained(base, repo_id, subfolder="lora_adapter")
```

## Example inference

Below is a minimal example that encourages **SQL-only** output (critical for 100% parseability).

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

repo_id = "aravula7/qwen-sql-finetuning"
tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="fp16")
model = AutoModelForCausalLM.from_pretrained(
    repo_id, 
    subfolder="fp16",
    torch_dtype=torch.float16,
    device_map="auto"
)

system = "Return ONLY the PostgreSQL query. Do NOT include explanations, markdown, code fences, or commentary."
schema = "Table: customers (customer_id, email, state)\nTable: orders (order_id, customer_id, order_timestamp)"
request = "Show the number of orders per customer in 2025."

prompt = f"""{system}

Schema:
{schema}

Request:
{request}
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
    out = model.generate(
        **inputs, 
        max_new_tokens=256, 
        do_sample=False,
        pad_token_id=tokenizer.eos_token_id
    )

sql = tokenizer.decode(out[0], skip_special_tokens=True)
# Extract SQL after prompt
sql = sql.split("Request:")[-1].strip()
print(sql)
```

## License

This project is licensed under the MIT License. The fine-tuned model is a derivative of Qwen2.5-3B-Instruct and inherits its license terms.

**Full documentation and code:** [GitHub Repository](https://github.com/aravula7/qwen-sql-finetuning)

## Reproducibility

Training and evaluation were tracked with MLflow on DagsHub. The GitHub repository contains:

* Complete Colab notebook with training and evaluation code
* Dataset (500 examples: 350 train, 50 val, 100 test)
* Visualization scripts for 3D performance analysis
* Production-ready inference code with error handling

**Links:**
* [GitHub Repository](https://github.com/aravula7/qwen-sql-finetuning)
* [MLflow Experiments](https://dagshub.com/aravula7/llm-finetuning)
* [Base Model](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)

## Citation

```bibtex
@misc{qwen-sql-finetuning-2026,
  author = {Anirudh Reddy Ravula},
  title = {Qwen2.5-3B Text-to-SQL Fine-Tuning for PostgreSQL},
  year = {2026},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/aravula7/qwen-sql-finetuning}},
  note = {Fine-tuned with QLoRA for e-commerce SQL generation}
}
```