aravula7
/

qwen-sql-finetuning

+---
+language: en
+pipeline_tag: text-generation
+library_name: transformers
+tags:
+  - text-to-sql
+  - sql
+  - postgresql
+  - qwen2.5
+  - qlora
+  - peft
+  - quantization
+base_model: Qwen/Qwen2.5-3B-Instruct
+license: other
+---
+# Qwen2.5-3B Text-to-SQL (PostgreSQL) — Fine-Tuned
+## Overview
+This repository contains a fine-tuned **Qwen/Qwen2.5-3B-Instruct** model specialized for **Text-to-SQL** generation in **PostgreSQL** for a realistic e-commerce + subscriptions analytics schema.
+Artifacts are organized under a single Hub repo using subfolders:
+- `fp16/` — merged FP16 model (recommended)
+- `int8/` — quantized INT8 checkpoint (smaller footprint)
+- `lora_adapter/` — LoRA adapter only (for further tuning / research)
+## Intended use
+**Use cases**
+- Convert natural language questions into PostgreSQL queries.
+- Analytical queries over common e-commerce tables (customers, orders, products, subscriptions) plus ML prediction tables (churn/forecast).
+**Not for**
+- Direct execution on sensitive or production databases without validation (schema checks, allow-lists, sandbox execution).
+- Security-critical contexts (SQL injection prevention and access control must be handled outside the model).
+## Training summary
+| Item | Value |
+|---|---|
+| Base model | Qwen/Qwen2.5-3B-Instruct |
+| Fine-tuning method | QLoRA (4-bit) |
+| Optimizer | paged_adamw_8bit |
+| Epochs | 4 |
+| Decoding | Greedy |
+| Tracking | MLflow (DagsHub) |
+## Evaluation summary (100 test examples)
+Primary metric: **parseable PostgreSQL SQL** (validated with `sqlglot`).
+Secondary metric: **exact match** (strict string match vs. reference SQL).
+| Model | Parseable SQL | Exact match | Mean latency (s) | P50 (s) | P95 (s) |
+|---|---:|---:|---:|---:|---:|
+| qwen_baseline_fp16 | 1.00 | 0.09 | 0.405 | 0.422 | 0.624 |
+| qwen_finetuned_fp16 | 0.93 | 0.13 | 0.527 | 0.711 | 0.739 |
+| qwen_finetuned_int8 | 0.93 | 0.13 | 2.672 | 3.454 | 3.623 |
+| qwen_finetuned_fp16_strict | 1.00 | 0.15 | 0.433 | 0.427 | 0.736 |
+| qwen_finetuned_int8_strict | 0.99 | 0.20 | 2.152 | 2.541 | 3.610 |
+| gpt-4o-mini | 1.00 | 0.04 | 1.616 | 1.551 | 2.820 |
+| claude-3.5-haiku | 0.99 | 0.07 | 1.735 | 1.541 | 2.697 |
+Notes:
+- The “strict” variants used a stricter system instruction to return **SQL only** (no prose, no markdown), which improved reliability.
+- INT8 reduced memory usage but was slower in this specific GPU evaluation setup.
+## How to load
+### Load the merged FP16 model (recommended)
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+repo_id = "aravula7/qwen-sql-finetuning"
+tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="fp16")
+model = AutoModelForCausalLM.from_pretrained(repo_id, subfolder="fp16")
+```
+### Load the INT8 model
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+repo_id = "aravula7/qwen-sql-finetuning"
+tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="int8")
+model = AutoModelForCausalLM.from_pretrained(repo_id, subfolder="int8")
+```
+### Load base model + LoRA adapter
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from peft import PeftModel
+base_id = "Qwen/Qwen2.5-3B-Instruct"
+repo_id = "aravula7/qwen-sql-finetuning"
+tokenizer = AutoTokenizer.from_pretrained(base_id)
+base = AutoModelForCausalLM.from_pretrained(base_id)
+model = PeftModel.from_pretrained(base, repo_id, subfolder="lora_adapter")
+```
+## Example inference
+Below is a minimal example that encourages **SQL-only** output.
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+repo_id = "aravula7/qwen-sql-finetuning"
+tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="fp16")
+model = AutoModelForCausalLM.from_pretrained(repo_id, subfolder="fp16")
+system = "Return ONLY the PostgreSQL query. Do NOT include explanations, markdown, code fences, or commentary."
+schema = "Table: customers (customer_id, email, state)\nTable: orders (order_id, customer_id, order_timestamp)"
+request = "Show the number of orders per customer in 2025."
+prompt = f"""{system}
+Schema:
+{schema}
+Request:
+{request}
+"""
+inputs = tokenizer(prompt, return_tensors="pt")
+with torch.no_grad():
+    out = model.generate(**inputs, max_new_tokens=256, do_sample=False)
+print(tokenizer.decode(out[0], skip_special_tokens=True))
+```
+## License
+This repository is a fine-tuned derivative of the base model listed in the metadata. Please follow the licensing terms of the base model and any dataset constraints used for training.
+## Reproducibility
+Training and evaluation were tracked with MLflow on DagsHub. The associated GitHub/DagsHub repository contains the notebook, data splits, and logged runs.