Update README.md
Browse files
README.md
CHANGED
|
@@ -12,6 +12,8 @@ tags:
|
|
| 12 |
- quantization
|
| 13 |
base_model: Qwen/Qwen2.5-3B-Instruct
|
| 14 |
license: mit
|
|
|
|
|
|
|
| 15 |
---
|
| 16 |
|
| 17 |
# Qwen2.5-3B Text-to-SQL (PostgreSQL) — Fine-Tuned
|
|
@@ -22,30 +24,32 @@ This repository contains a fine-tuned **Qwen/Qwen2.5-3B-Instruct** model special
|
|
| 22 |
|
| 23 |
Artifacts are organized under a single Hub repo using subfolders:
|
| 24 |
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
|
| 29 |
## Intended use
|
| 30 |
|
| 31 |
**Use cases**
|
| 32 |
|
| 33 |
-
|
| 34 |
-
|
| 35 |
|
| 36 |
**Not for**
|
| 37 |
|
| 38 |
-
|
| 39 |
-
|
| 40 |
|
| 41 |
## Training summary
|
| 42 |
|
| 43 |
| Item | Value |
|
| 44 |
-
|
| 45 |
| Base model | Qwen/Qwen2.5-3B-Instruct |
|
| 46 |
| Fine-tuning method | QLoRA (4-bit) |
|
| 47 |
-
| Optimizer |
|
| 48 |
| Epochs | 4 |
|
|
|
|
|
|
|
| 49 |
| Decoding | Greedy |
|
| 50 |
| Tracking | MLflow (DagsHub) |
|
| 51 |
|
|
@@ -55,18 +59,27 @@ Primary metric: **parseable PostgreSQL SQL** (validated with `sqlglot`).
|
|
| 55 |
Secondary metric: **exact match** (strict string match vs. reference SQL).
|
| 56 |
|
| 57 |
| Model | Parseable SQL | Exact match | Mean latency (s) | P50 (s) | P95 (s) |
|
| 58 |
-
|
| 59 |
-
|
|
| 60 |
-
|
|
| 61 |
-
|
|
| 62 |
-
|
|
| 63 |
-
|
|
| 64 |
| gpt-4o-mini | 1.00 | 0.04 | 1.616 | 1.551 | 2.820 |
|
| 65 |
| claude-3.5-haiku | 0.99 | 0.07 | 1.735 | 1.541 | 2.697 |
|
| 66 |
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 70 |
|
| 71 |
## How to load
|
| 72 |
|
|
@@ -78,7 +91,12 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
| 78 |
repo_id = "aravula7/qwen-sql-finetuning"
|
| 79 |
|
| 80 |
tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="fp16")
|
| 81 |
-
model = AutoModelForCausalLM.from_pretrained(
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 82 |
```
|
| 83 |
|
| 84 |
### Load the INT8 model
|
|
@@ -89,7 +107,11 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
| 89 |
repo_id = "aravula7/qwen-sql-finetuning"
|
| 90 |
|
| 91 |
tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="int8")
|
| 92 |
-
model = AutoModelForCausalLM.from_pretrained(
|
|
|
|
|
|
|
|
|
|
|
|
|
| 93 |
```
|
| 94 |
|
| 95 |
### Load base model + LoRA adapter
|
|
@@ -97,19 +119,24 @@ model = AutoModelForCausalLM.from_pretrained(repo_id, subfolder="int8")
|
|
| 97 |
```python
|
| 98 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 99 |
from peft import PeftModel
|
|
|
|
| 100 |
|
| 101 |
base_id = "Qwen/Qwen2.5-3B-Instruct"
|
| 102 |
repo_id = "aravula7/qwen-sql-finetuning"
|
| 103 |
|
| 104 |
tokenizer = AutoTokenizer.from_pretrained(base_id)
|
| 105 |
-
base = AutoModelForCausalLM.from_pretrained(
|
|
|
|
|
|
|
|
|
|
|
|
|
| 106 |
|
| 107 |
model = PeftModel.from_pretrained(base, repo_id, subfolder="lora_adapter")
|
| 108 |
```
|
| 109 |
|
| 110 |
## Example inference
|
| 111 |
|
| 112 |
-
Below is a minimal example that encourages **SQL-only** output.
|
| 113 |
|
| 114 |
```python
|
| 115 |
import torch
|
|
@@ -117,7 +144,12 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
| 117 |
|
| 118 |
repo_id = "aravula7/qwen-sql-finetuning"
|
| 119 |
tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="fp16")
|
| 120 |
-
model = AutoModelForCausalLM.from_pretrained(
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 121 |
|
| 122 |
system = "Return ONLY the PostgreSQL query. Do NOT include explanations, markdown, code fences, or commentary."
|
| 123 |
schema = "Table: customers (customer_id, email, state)\nTable: orders (order_id, customer_id, order_timestamp)"
|
|
@@ -132,18 +164,50 @@ Request:
|
|
| 132 |
{request}
|
| 133 |
"""
|
| 134 |
|
| 135 |
-
inputs = tokenizer(prompt, return_tensors="pt")
|
| 136 |
with torch.no_grad():
|
| 137 |
-
out = model.generate(
|
| 138 |
-
|
| 139 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 140 |
```
|
| 141 |
|
| 142 |
## License
|
| 143 |
|
| 144 |
-
This
|
| 145 |
-
|
|
|
|
| 146 |
|
| 147 |
## Reproducibility
|
| 148 |
|
| 149 |
-
Training and evaluation were tracked with MLflow on DagsHub. The
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
- quantization
|
| 13 |
base_model: Qwen/Qwen2.5-3B-Instruct
|
| 14 |
license: mit
|
| 15 |
+
metrics:
|
| 16 |
+
- accuracy
|
| 17 |
---
|
| 18 |
|
| 19 |
# Qwen2.5-3B Text-to-SQL (PostgreSQL) — Fine-Tuned
|
|
|
|
| 24 |
|
| 25 |
Artifacts are organized under a single Hub repo using subfolders:
|
| 26 |
|
| 27 |
+
* `fp16/` — merged FP16 model (recommended)
|
| 28 |
+
* `int8/` — quantized INT8 checkpoint (smaller footprint)
|
| 29 |
+
* `lora_adapter/` — LoRA adapter only (for further tuning / research)
|
| 30 |
|
| 31 |
## Intended use
|
| 32 |
|
| 33 |
**Use cases**
|
| 34 |
|
| 35 |
+
* Convert natural language questions into PostgreSQL queries.
|
| 36 |
+
* Analytical queries over common e-commerce tables (customers, orders, products, subscriptions) plus ML prediction tables (churn/forecast).
|
| 37 |
|
| 38 |
**Not for**
|
| 39 |
|
| 40 |
+
* Direct execution on sensitive or production databases without validation (schema checks, allow-lists, sandbox execution).
|
| 41 |
+
* Security-critical contexts (SQL injection prevention and access control must be handled outside the model).
|
| 42 |
|
| 43 |
## Training summary
|
| 44 |
|
| 45 |
| Item | Value |
|
| 46 |
+
| --- | --- |
|
| 47 |
| Base model | Qwen/Qwen2.5-3B-Instruct |
|
| 48 |
| Fine-tuning method | QLoRA (4-bit) |
|
| 49 |
+
| Optimizer | paged\_adamw\_8bit |
|
| 50 |
| Epochs | 4 |
|
| 51 |
+
| Training time | ~4 minutes (A100) |
|
| 52 |
+
| Trainable params | 29.9M (1.73% of 3B total) |
|
| 53 |
| Decoding | Greedy |
|
| 54 |
| Tracking | MLflow (DagsHub) |
|
| 55 |
|
|
|
|
| 59 |
Secondary metric: **exact match** (strict string match vs. reference SQL).
|
| 60 |
|
| 61 |
| Model | Parseable SQL | Exact match | Mean latency (s) | P50 (s) | P95 (s) |
|
| 62 |
+
| --- | --- | --- | --- | --- | --- |
|
| 63 |
+
| **qwen\_finetuned\_fp16\_strict** | **1.00** | **0.15** | **0.433** | 0.427 | 0.736 |
|
| 64 |
+
| qwen\_finetuned\_int8\_strict | 0.99 | 0.20 | 2.152 | 2.541 | 3.610 |
|
| 65 |
+
| qwen\_baseline\_fp16 | 1.00 | 0.09 | 0.405 | 0.422 | 0.624 |
|
| 66 |
+
| qwen\_finetuned\_fp16 | 0.93 | 0.13 | 0.527 | 0.711 | 0.739 |
|
| 67 |
+
| qwen\_finetuned\_int8 | 0.93 | 0.13 | 2.672 | 3.454 | 3.623 |
|
| 68 |
| gpt-4o-mini | 1.00 | 0.04 | 1.616 | 1.551 | 2.820 |
|
| 69 |
| claude-3.5-haiku | 0.99 | 0.07 | 1.735 | 1.541 | 2.697 |
|
| 70 |
|
| 71 |
+
**Key Findings:**
|
| 72 |
+
|
| 73 |
+
* **Strict prompting is critical**: Adding "Return ONLY the PostgreSQL query. Do NOT include explanations, markdown, or commentary" improved parseable rate from 93% to 100%
|
| 74 |
+
* **Fine-tuning improves accuracy**: Exact match increased from 9% (baseline) to 15% (fine-tuned), a **67% improvement**
|
| 75 |
+
* **Quantization trade-offs**: INT8 maintains accuracy (20% exact match, best across all models) with 50% memory reduction but shows 5x latency increase
|
| 76 |
+
* **Competitive with APIs**: Fine-tuned model achieves **4x better exact match** than GPT-4o-mini while maintaining comparable speed
|
| 77 |
+
|
| 78 |
+
## Results Visualization
|
| 79 |
+
|
| 80 |
+

|
| 81 |
+
|
| 82 |
+
*Parseable SQL rate and exact match accuracy comparison across all 7 models.*
|
| 83 |
|
| 84 |
## How to load
|
| 85 |
|
|
|
|
| 91 |
repo_id = "aravula7/qwen-sql-finetuning"
|
| 92 |
|
| 93 |
tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="fp16")
|
| 94 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 95 |
+
repo_id,
|
| 96 |
+
subfolder="fp16",
|
| 97 |
+
torch_dtype=torch.float16,
|
| 98 |
+
device_map="auto"
|
| 99 |
+
)
|
| 100 |
```
|
| 101 |
|
| 102 |
### Load the INT8 model
|
|
|
|
| 107 |
repo_id = "aravula7/qwen-sql-finetuning"
|
| 108 |
|
| 109 |
tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="int8")
|
| 110 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 111 |
+
repo_id,
|
| 112 |
+
subfolder="int8",
|
| 113 |
+
device_map="auto"
|
| 114 |
+
)
|
| 115 |
```
|
| 116 |
|
| 117 |
### Load base model + LoRA adapter
|
|
|
|
| 119 |
```python
|
| 120 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 121 |
from peft import PeftModel
|
| 122 |
+
import torch
|
| 123 |
|
| 124 |
base_id = "Qwen/Qwen2.5-3B-Instruct"
|
| 125 |
repo_id = "aravula7/qwen-sql-finetuning"
|
| 126 |
|
| 127 |
tokenizer = AutoTokenizer.from_pretrained(base_id)
|
| 128 |
+
base = AutoModelForCausalLM.from_pretrained(
|
| 129 |
+
base_id,
|
| 130 |
+
torch_dtype=torch.float16,
|
| 131 |
+
device_map="auto"
|
| 132 |
+
)
|
| 133 |
|
| 134 |
model = PeftModel.from_pretrained(base, repo_id, subfolder="lora_adapter")
|
| 135 |
```
|
| 136 |
|
| 137 |
## Example inference
|
| 138 |
|
| 139 |
+
Below is a minimal example that encourages **SQL-only** output (critical for 100% parseability).
|
| 140 |
|
| 141 |
```python
|
| 142 |
import torch
|
|
|
|
| 144 |
|
| 145 |
repo_id = "aravula7/qwen-sql-finetuning"
|
| 146 |
tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="fp16")
|
| 147 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 148 |
+
repo_id,
|
| 149 |
+
subfolder="fp16",
|
| 150 |
+
torch_dtype=torch.float16,
|
| 151 |
+
device_map="auto"
|
| 152 |
+
)
|
| 153 |
|
| 154 |
system = "Return ONLY the PostgreSQL query. Do NOT include explanations, markdown, code fences, or commentary."
|
| 155 |
schema = "Table: customers (customer_id, email, state)\nTable: orders (order_id, customer_id, order_timestamp)"
|
|
|
|
| 164 |
{request}
|
| 165 |
"""
|
| 166 |
|
| 167 |
+
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
|
| 168 |
with torch.no_grad():
|
| 169 |
+
out = model.generate(
|
| 170 |
+
**inputs,
|
| 171 |
+
max_new_tokens=256,
|
| 172 |
+
do_sample=False,
|
| 173 |
+
pad_token_id=tokenizer.eos_token_id
|
| 174 |
+
)
|
| 175 |
+
|
| 176 |
+
sql = tokenizer.decode(out[0], skip_special_tokens=True)
|
| 177 |
+
# Extract SQL after prompt
|
| 178 |
+
sql = sql.split("Request:")[-1].strip()
|
| 179 |
+
print(sql)
|
| 180 |
```
|
| 181 |
|
| 182 |
## License
|
| 183 |
|
| 184 |
+
This project is licensed under the MIT License. The fine-tuned model is a derivative of Qwen2.5-3B-Instruct and inherits its license terms.
|
| 185 |
+
|
| 186 |
+
**Full documentation and code:** [GitHub Repository](https://github.com/aravula7/qwen-sql-finetuning)
|
| 187 |
|
| 188 |
## Reproducibility
|
| 189 |
|
| 190 |
+
Training and evaluation were tracked with MLflow on DagsHub. The GitHub repository contains:
|
| 191 |
+
|
| 192 |
+
* Complete Colab notebook with training and evaluation code
|
| 193 |
+
* Dataset (500 examples: 350 train, 50 val, 100 test)
|
| 194 |
+
* Visualization scripts for 3D performance analysis
|
| 195 |
+
* Production-ready inference code with error handling
|
| 196 |
+
|
| 197 |
+
**Links:**
|
| 198 |
+
* [GitHub Repository](https://github.com/aravula7/qwen-sql-finetuning)
|
| 199 |
+
* [MLflow Experiments](https://dagshub.com/aravula7/llm-finetuning)
|
| 200 |
+
* [Base Model](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)
|
| 201 |
+
|
| 202 |
+
## Citation
|
| 203 |
+
|
| 204 |
+
```bibtex
|
| 205 |
+
@misc{qwen-sql-finetuning-2025,
|
| 206 |
+
author = {Anirudh Reddy Ravula},
|
| 207 |
+
title = {Qwen2.5-3B Text-to-SQL Fine-Tuning for PostgreSQL},
|
| 208 |
+
year = {2025},
|
| 209 |
+
publisher = {HuggingFace},
|
| 210 |
+
howpublished = {\url{https://huggingface.co/aravula7/qwen-sql-finetuning}},
|
| 211 |
+
note = {Fine-tuned with QLoRA for e-commerce SQL generation}
|
| 212 |
+
}
|
| 213 |
+
```
|