| | --- |
| | language: en |
| | pipeline_tag: text-generation |
| | library_name: transformers |
| | tags: |
| | - text-to-sql |
| | - sql |
| | - postgresql |
| | - qwen2.5 |
| | - qlora |
| | - peft |
| | - quantization |
| | base_model: Qwen/Qwen2.5-3B-Instruct |
| | license: mit |
| | metrics: |
| | - accuracy |
| | --- |
| | |
| | # Qwen2.5-3B Text-to-SQL (PostgreSQL) — Fine-Tuned |
| |
|
| | ## Overview |
| |
|
| | This repository contains a fine-tuned **Qwen/Qwen2.5-3B-Instruct** model specialized for **Text-to-SQL** generation in **PostgreSQL** for a realistic e-commerce + subscriptions analytics schema. |
| |
|
| | Artifacts are organized under a single Hub repo using subfolders: |
| |
|
| | * `fp16/` — merged FP16 model (recommended) |
| | * `int8/` — quantized INT8 checkpoint (smaller footprint) |
| | * `lora_adapter/` — LoRA adapter only (for further tuning / research) |
| |
|
| | ## Intended use |
| |
|
| | **Use cases** |
| |
|
| | * Convert natural language questions into PostgreSQL queries. |
| | * Analytical queries over common e-commerce tables (customers, orders, products, subscriptions) plus ML prediction tables (churn/forecast). |
| |
|
| | **Not for** |
| |
|
| | * Direct execution on sensitive or production databases without validation (schema checks, allow-lists, sandbox execution). |
| | * Security-critical contexts (SQL injection prevention and access control must be handled outside the model). |
| |
|
| | ## Training summary |
| |
|
| | | Item | Value | |
| | | --- | --- | |
| | | Base model | Qwen/Qwen2.5-3B-Instruct | |
| | | Fine-tuning method | QLoRA (4-bit) | |
| | | Optimizer | paged\_adamw\_8bit | |
| | | Epochs | 4 | |
| | | Training time | ~4 minutes (A100) | |
| | | Trainable params | 29.9M (1.73% of 3B total) | |
| | | Decoding | Greedy | |
| | | Tracking | MLflow (DagsHub) | |
| |
|
| | ## Evaluation summary (100 test examples) |
| |
|
| | Primary metric: **parseable PostgreSQL SQL** (validated with `sqlglot`). |
| | Secondary metric: **exact match** (strict string match vs. reference SQL). |
| |
|
| | | Model | Parseable SQL | Exact match | Mean latency (s) | P50 (s) | P95 (s) | |
| | | --- | --- | --- | --- | --- | --- | |
| | | **qwen\_finetuned\_fp16\_strict** | **1.00** | **0.15** | **0.433** | 0.427 | 0.736 | |
| | | qwen\_finetuned\_int8\_strict | 0.99 | 0.20 | 2.152 | 2.541 | 3.610 | |
| | | qwen\_baseline\_fp16 | 1.00 | 0.09 | 0.405 | 0.422 | 0.624 | |
| | | qwen\_finetuned\_fp16 | 0.93 | 0.13 | 0.527 | 0.711 | 0.739 | |
| | | qwen\_finetuned\_int8 | 0.93 | 0.13 | 2.672 | 3.454 | 3.623 | |
| | | gpt-4o-mini | 1.00 | 0.04 | 1.616 | 1.551 | 2.820 | |
| | | claude-3.5-haiku | 0.99 | 0.07 | 1.735 | 1.541 | 2.697 | |
| | |
| | **Key Findings:** |
| |
|
| | * **Strict prompting is critical**: Adding "Return ONLY the PostgreSQL query. Do NOT include explanations, markdown, or commentary" improved parseable rate from 93% to 100% |
| | * **Fine-tuning improves accuracy**: Exact match increased from 9% (baseline) to 15% (fine-tuned), a **67% improvement** |
| | * **Quantization trade-offs**: INT8 maintains accuracy (20% exact match, best across all models) with 50% memory reduction but shows 5x latency increase |
| | * **Competitive with APIs**: Fine-tuned model achieves **4x better exact match** than GPT-4o-mini while maintaining comparable speed |
| |
|
| | ## Results Visualization |
| |
|
| |  |
| |
|
| | *Parseable SQL rate and exact match accuracy comparison across all 7 models.* |
| |
|
| | ## How to load |
| |
|
| | ### Load the merged FP16 model (recommended) |
| |
|
| | ```python |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | |
| | repo_id = "aravula7/qwen-sql-finetuning" |
| | |
| | tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="fp16") |
| | model = AutoModelForCausalLM.from_pretrained( |
| | repo_id, |
| | subfolder="fp16", |
| | torch_dtype=torch.float16, |
| | device_map="auto" |
| | ) |
| | ``` |
| |
|
| | ### Load the INT8 model |
| |
|
| | ```python |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | |
| | repo_id = "aravula7/qwen-sql-finetuning" |
| | |
| | tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="int8") |
| | model = AutoModelForCausalLM.from_pretrained( |
| | repo_id, |
| | subfolder="int8", |
| | device_map="auto" |
| | ) |
| | ``` |
| |
|
| | ### Load base model + LoRA adapter |
| |
|
| | ```python |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | from peft import PeftModel |
| | import torch |
| | |
| | base_id = "Qwen/Qwen2.5-3B-Instruct" |
| | repo_id = "aravula7/qwen-sql-finetuning" |
| | |
| | tokenizer = AutoTokenizer.from_pretrained(base_id) |
| | base = AutoModelForCausalLM.from_pretrained( |
| | base_id, |
| | torch_dtype=torch.float16, |
| | device_map="auto" |
| | ) |
| | |
| | model = PeftModel.from_pretrained(base, repo_id, subfolder="lora_adapter") |
| | ``` |
| |
|
| | ## Example inference |
| |
|
| | Below is a minimal example that encourages **SQL-only** output (critical for 100% parseability). |
| |
|
| | ```python |
| | import torch |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | |
| | repo_id = "aravula7/qwen-sql-finetuning" |
| | tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="fp16") |
| | model = AutoModelForCausalLM.from_pretrained( |
| | repo_id, |
| | subfolder="fp16", |
| | torch_dtype=torch.float16, |
| | device_map="auto" |
| | ) |
| | |
| | system = "Return ONLY the PostgreSQL query. Do NOT include explanations, markdown, code fences, or commentary." |
| | schema = "Table: customers (customer_id, email, state)\nTable: orders (order_id, customer_id, order_timestamp)" |
| | request = "Show the number of orders per customer in 2025." |
| | |
| | prompt = f"""{system} |
| | |
| | Schema: |
| | {schema} |
| | |
| | Request: |
| | {request} |
| | """ |
| | |
| | inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
| | with torch.no_grad(): |
| | out = model.generate( |
| | **inputs, |
| | max_new_tokens=256, |
| | do_sample=False, |
| | pad_token_id=tokenizer.eos_token_id |
| | ) |
| | |
| | sql = tokenizer.decode(out[0], skip_special_tokens=True) |
| | # Extract SQL after prompt |
| | sql = sql.split("Request:")[-1].strip() |
| | print(sql) |
| | ``` |
| |
|
| | ## License |
| |
|
| | This project is licensed under the MIT License. The fine-tuned model is a derivative of Qwen2.5-3B-Instruct and inherits its license terms. |
| |
|
| | **Full documentation and code:** [GitHub Repository](https://github.com/aravula7/qwen-sql-finetuning) |
| |
|
| | ## Reproducibility |
| |
|
| | Training and evaluation were tracked with MLflow on DagsHub. The GitHub repository contains: |
| |
|
| | * Complete Colab notebook with training and evaluation code |
| | * Dataset (500 examples: 350 train, 50 val, 100 test) |
| | * Visualization scripts for 3D performance analysis |
| | * Production-ready inference code with error handling |
| |
|
| | **Links:** |
| | * [GitHub Repository](https://github.com/aravula7/qwen-sql-finetuning) |
| | * [MLflow Experiments](https://dagshub.com/aravula7/llm-finetuning) |
| | * [Base Model](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @misc{qwen-sql-finetuning-2026, |
| | author = {Anirudh Reddy Ravula}, |
| | title = {Qwen2.5-3B Text-to-SQL Fine-Tuning for PostgreSQL}, |
| | year = {2026}, |
| | publisher = {HuggingFace}, |
| | howpublished = {\url{https://huggingface.co/aravula7/qwen-sql-finetuning}}, |
| | note = {Fine-tuned with QLoRA for e-commerce SQL generation} |
| | } |
| | ``` |