aravula7 commited on
Commit
042ca6f
·
verified ·
1 Parent(s): 04995b2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +94 -30
README.md CHANGED
@@ -12,6 +12,8 @@ tags:
12
  - quantization
13
  base_model: Qwen/Qwen2.5-3B-Instruct
14
  license: mit
 
 
15
  ---
16
 
17
  # Qwen2.5-3B Text-to-SQL (PostgreSQL) — Fine-Tuned
@@ -22,30 +24,32 @@ This repository contains a fine-tuned **Qwen/Qwen2.5-3B-Instruct** model special
22
 
23
  Artifacts are organized under a single Hub repo using subfolders:
24
 
25
- - `fp16/` — merged FP16 model (recommended)
26
- - `int8/` — quantized INT8 checkpoint (smaller footprint)
27
- - `lora_adapter/` — LoRA adapter only (for further tuning / research)
28
 
29
  ## Intended use
30
 
31
  **Use cases**
32
 
33
- - Convert natural language questions into PostgreSQL queries.
34
- - Analytical queries over common e-commerce tables (customers, orders, products, subscriptions) plus ML prediction tables (churn/forecast).
35
 
36
  **Not for**
37
 
38
- - Direct execution on sensitive or production databases without validation (schema checks, allow-lists, sandbox execution).
39
- - Security-critical contexts (SQL injection prevention and access control must be handled outside the model).
40
 
41
  ## Training summary
42
 
43
  | Item | Value |
44
- |---|---|
45
  | Base model | Qwen/Qwen2.5-3B-Instruct |
46
  | Fine-tuning method | QLoRA (4-bit) |
47
- | Optimizer | paged_adamw_8bit |
48
  | Epochs | 4 |
 
 
49
  | Decoding | Greedy |
50
  | Tracking | MLflow (DagsHub) |
51
 
@@ -55,18 +59,27 @@ Primary metric: **parseable PostgreSQL SQL** (validated with `sqlglot`).
55
  Secondary metric: **exact match** (strict string match vs. reference SQL).
56
 
57
  | Model | Parseable SQL | Exact match | Mean latency (s) | P50 (s) | P95 (s) |
58
- |---|---:|---:|---:|---:|---:|
59
- | qwen_baseline_fp16 | 1.00 | 0.09 | 0.405 | 0.422 | 0.624 |
60
- | qwen_finetuned_fp16 | 0.93 | 0.13 | 0.527 | 0.711 | 0.739 |
61
- | qwen_finetuned_int8 | 0.93 | 0.13 | 2.672 | 3.454 | 3.623 |
62
- | qwen_finetuned_fp16_strict | 1.00 | 0.15 | 0.433 | 0.427 | 0.736 |
63
- | qwen_finetuned_int8_strict | 0.99 | 0.20 | 2.152 | 2.541 | 3.610 |
64
  | gpt-4o-mini | 1.00 | 0.04 | 1.616 | 1.551 | 2.820 |
65
  | claude-3.5-haiku | 0.99 | 0.07 | 1.735 | 1.541 | 2.697 |
66
 
67
- Notes:
68
- - The “strict” variants used a stricter system instruction to return **SQL only** (no prose, no markdown), which improved reliability.
69
- - INT8 reduced memory usage but was slower in this specific GPU evaluation setup.
 
 
 
 
 
 
 
 
 
70
 
71
  ## How to load
72
 
@@ -78,7 +91,12 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
78
  repo_id = "aravula7/qwen-sql-finetuning"
79
 
80
  tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="fp16")
81
- model = AutoModelForCausalLM.from_pretrained(repo_id, subfolder="fp16")
 
 
 
 
 
82
  ```
83
 
84
  ### Load the INT8 model
@@ -89,7 +107,11 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
89
  repo_id = "aravula7/qwen-sql-finetuning"
90
 
91
  tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="int8")
92
- model = AutoModelForCausalLM.from_pretrained(repo_id, subfolder="int8")
 
 
 
 
93
  ```
94
 
95
  ### Load base model + LoRA adapter
@@ -97,19 +119,24 @@ model = AutoModelForCausalLM.from_pretrained(repo_id, subfolder="int8")
97
  ```python
98
  from transformers import AutoModelForCausalLM, AutoTokenizer
99
  from peft import PeftModel
 
100
 
101
  base_id = "Qwen/Qwen2.5-3B-Instruct"
102
  repo_id = "aravula7/qwen-sql-finetuning"
103
 
104
  tokenizer = AutoTokenizer.from_pretrained(base_id)
105
- base = AutoModelForCausalLM.from_pretrained(base_id)
 
 
 
 
106
 
107
  model = PeftModel.from_pretrained(base, repo_id, subfolder="lora_adapter")
108
  ```
109
 
110
  ## Example inference
111
 
112
- Below is a minimal example that encourages **SQL-only** output.
113
 
114
  ```python
115
  import torch
@@ -117,7 +144,12 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
117
 
118
  repo_id = "aravula7/qwen-sql-finetuning"
119
  tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="fp16")
120
- model = AutoModelForCausalLM.from_pretrained(repo_id, subfolder="fp16")
 
 
 
 
 
121
 
122
  system = "Return ONLY the PostgreSQL query. Do NOT include explanations, markdown, code fences, or commentary."
123
  schema = "Table: customers (customer_id, email, state)\nTable: orders (order_id, customer_id, order_timestamp)"
@@ -132,18 +164,50 @@ Request:
132
  {request}
133
  """
134
 
135
- inputs = tokenizer(prompt, return_tensors="pt")
136
  with torch.no_grad():
137
- out = model.generate(**inputs, max_new_tokens=256, do_sample=False)
138
-
139
- print(tokenizer.decode(out[0], skip_special_tokens=True))
 
 
 
 
 
 
 
 
140
  ```
141
 
142
  ## License
143
 
144
- This repository is a fine-tuned derivative of the base model listed in the metadata. Please follow the licensing terms of the base model and any dataset constraints used for training. Available at Github.
145
- GitHub: https://www.github.com/aravula7/qwen-sql-finetuning/
 
146
 
147
  ## Reproducibility
148
 
149
- Training and evaluation were tracked with MLflow on DagsHub. The associated GitHub/DagsHub repository contains the notebook, data splits, and logged runs.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  - quantization
13
  base_model: Qwen/Qwen2.5-3B-Instruct
14
  license: mit
15
+ metrics:
16
+ - accuracy
17
  ---
18
 
19
  # Qwen2.5-3B Text-to-SQL (PostgreSQL) — Fine-Tuned
 
24
 
25
  Artifacts are organized under a single Hub repo using subfolders:
26
 
27
+ * `fp16/` — merged FP16 model (recommended)
28
+ * `int8/` — quantized INT8 checkpoint (smaller footprint)
29
+ * `lora_adapter/` — LoRA adapter only (for further tuning / research)
30
 
31
  ## Intended use
32
 
33
  **Use cases**
34
 
35
+ * Convert natural language questions into PostgreSQL queries.
36
+ * Analytical queries over common e-commerce tables (customers, orders, products, subscriptions) plus ML prediction tables (churn/forecast).
37
 
38
  **Not for**
39
 
40
+ * Direct execution on sensitive or production databases without validation (schema checks, allow-lists, sandbox execution).
41
+ * Security-critical contexts (SQL injection prevention and access control must be handled outside the model).
42
 
43
  ## Training summary
44
 
45
  | Item | Value |
46
+ | --- | --- |
47
  | Base model | Qwen/Qwen2.5-3B-Instruct |
48
  | Fine-tuning method | QLoRA (4-bit) |
49
+ | Optimizer | paged\_adamw\_8bit |
50
  | Epochs | 4 |
51
+ | Training time | ~4 minutes (A100) |
52
+ | Trainable params | 29.9M (1.73% of 3B total) |
53
  | Decoding | Greedy |
54
  | Tracking | MLflow (DagsHub) |
55
 
 
59
  Secondary metric: **exact match** (strict string match vs. reference SQL).
60
 
61
  | Model | Parseable SQL | Exact match | Mean latency (s) | P50 (s) | P95 (s) |
62
+ | --- | --- | --- | --- | --- | --- |
63
+ | **qwen\_finetuned\_fp16\_strict** | **1.00** | **0.15** | **0.433** | 0.427 | 0.736 |
64
+ | qwen\_finetuned\_int8\_strict | 0.99 | 0.20 | 2.152 | 2.541 | 3.610 |
65
+ | qwen\_baseline\_fp16 | 1.00 | 0.09 | 0.405 | 0.422 | 0.624 |
66
+ | qwen\_finetuned\_fp16 | 0.93 | 0.13 | 0.527 | 0.711 | 0.739 |
67
+ | qwen\_finetuned\_int8 | 0.93 | 0.13 | 2.672 | 3.454 | 3.623 |
68
  | gpt-4o-mini | 1.00 | 0.04 | 1.616 | 1.551 | 2.820 |
69
  | claude-3.5-haiku | 0.99 | 0.07 | 1.735 | 1.541 | 2.697 |
70
 
71
+ **Key Findings:**
72
+
73
+ * **Strict prompting is critical**: Adding "Return ONLY the PostgreSQL query. Do NOT include explanations, markdown, or commentary" improved parseable rate from 93% to 100%
74
+ * **Fine-tuning improves accuracy**: Exact match increased from 9% (baseline) to 15% (fine-tuned), a **67% improvement**
75
+ * **Quantization trade-offs**: INT8 maintains accuracy (20% exact match, best across all models) with 50% memory reduction but shows 5x latency increase
76
+ * **Competitive with APIs**: Fine-tuned model achieves **4x better exact match** than GPT-4o-mini while maintaining comparable speed
77
+
78
+ ## Results Visualization
79
+
80
+ ![Model Comparison](https://github.com/aravula7/qwen-sql-finetuning/raw/main/images/results_comparison.png)
81
+
82
+ *Parseable SQL rate and exact match accuracy comparison across all 7 models.*
83
 
84
  ## How to load
85
 
 
91
  repo_id = "aravula7/qwen-sql-finetuning"
92
 
93
  tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="fp16")
94
+ model = AutoModelForCausalLM.from_pretrained(
95
+ repo_id,
96
+ subfolder="fp16",
97
+ torch_dtype=torch.float16,
98
+ device_map="auto"
99
+ )
100
  ```
101
 
102
  ### Load the INT8 model
 
107
  repo_id = "aravula7/qwen-sql-finetuning"
108
 
109
  tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="int8")
110
+ model = AutoModelForCausalLM.from_pretrained(
111
+ repo_id,
112
+ subfolder="int8",
113
+ device_map="auto"
114
+ )
115
  ```
116
 
117
  ### Load base model + LoRA adapter
 
119
  ```python
120
  from transformers import AutoModelForCausalLM, AutoTokenizer
121
  from peft import PeftModel
122
+ import torch
123
 
124
  base_id = "Qwen/Qwen2.5-3B-Instruct"
125
  repo_id = "aravula7/qwen-sql-finetuning"
126
 
127
  tokenizer = AutoTokenizer.from_pretrained(base_id)
128
+ base = AutoModelForCausalLM.from_pretrained(
129
+ base_id,
130
+ torch_dtype=torch.float16,
131
+ device_map="auto"
132
+ )
133
 
134
  model = PeftModel.from_pretrained(base, repo_id, subfolder="lora_adapter")
135
  ```
136
 
137
  ## Example inference
138
 
139
+ Below is a minimal example that encourages **SQL-only** output (critical for 100% parseability).
140
 
141
  ```python
142
  import torch
 
144
 
145
  repo_id = "aravula7/qwen-sql-finetuning"
146
  tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="fp16")
147
+ model = AutoModelForCausalLM.from_pretrained(
148
+ repo_id,
149
+ subfolder="fp16",
150
+ torch_dtype=torch.float16,
151
+ device_map="auto"
152
+ )
153
 
154
  system = "Return ONLY the PostgreSQL query. Do NOT include explanations, markdown, code fences, or commentary."
155
  schema = "Table: customers (customer_id, email, state)\nTable: orders (order_id, customer_id, order_timestamp)"
 
164
  {request}
165
  """
166
 
167
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
168
  with torch.no_grad():
169
+ out = model.generate(
170
+ **inputs,
171
+ max_new_tokens=256,
172
+ do_sample=False,
173
+ pad_token_id=tokenizer.eos_token_id
174
+ )
175
+
176
+ sql = tokenizer.decode(out[0], skip_special_tokens=True)
177
+ # Extract SQL after prompt
178
+ sql = sql.split("Request:")[-1].strip()
179
+ print(sql)
180
  ```
181
 
182
  ## License
183
 
184
+ This project is licensed under the MIT License. The fine-tuned model is a derivative of Qwen2.5-3B-Instruct and inherits its license terms.
185
+
186
+ **Full documentation and code:** [GitHub Repository](https://github.com/aravula7/qwen-sql-finetuning)
187
 
188
  ## Reproducibility
189
 
190
+ Training and evaluation were tracked with MLflow on DagsHub. The GitHub repository contains:
191
+
192
+ * Complete Colab notebook with training and evaluation code
193
+ * Dataset (500 examples: 350 train, 50 val, 100 test)
194
+ * Visualization scripts for 3D performance analysis
195
+ * Production-ready inference code with error handling
196
+
197
+ **Links:**
198
+ * [GitHub Repository](https://github.com/aravula7/qwen-sql-finetuning)
199
+ * [MLflow Experiments](https://dagshub.com/aravula7/llm-finetuning)
200
+ * [Base Model](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)
201
+
202
+ ## Citation
203
+
204
+ ```bibtex
205
+ @misc{qwen-sql-finetuning-2025,
206
+ author = {Anirudh Reddy Ravula},
207
+ title = {Qwen2.5-3B Text-to-SQL Fine-Tuning for PostgreSQL},
208
+ year = {2025},
209
+ publisher = {HuggingFace},
210
+ howpublished = {\url{https://huggingface.co/aravula7/qwen-sql-finetuning}},
211
+ note = {Fine-tuned with QLoRA for e-commerce SQL generation}
212
+ }
213
+ ```