Update README.md

d90f877 verified 21 days ago

6.64 kB

	---
	language: en
	pipeline_tag: text-generation
	library_name: transformers
	tags:
	- text-to-sql
	- sql
	- postgresql
	- qwen2.5
	- qlora
	- peft
	- quantization
	base_model: Qwen/Qwen2.5-3B-Instruct
	license: mit
	metrics:
	- accuracy
	---

	# Qwen2.5-3B Text-to-SQL (PostgreSQL) — Fine-Tuned

	## Overview

	This repository contains a fine-tuned Qwen/Qwen2.5-3B-Instruct model specialized for Text-to-SQL generation in PostgreSQL for a realistic e-commerce + subscriptions analytics schema.

	Artifacts are organized under a single Hub repo using subfolders:

	* `fp16/` — merged FP16 model (recommended)
	* `int8/` — quantized INT8 checkpoint (smaller footprint)
	* `lora_adapter/` — LoRA adapter only (for further tuning / research)

	## Intended use

	Use cases

	* Convert natural language questions into PostgreSQL queries.
	* Analytical queries over common e-commerce tables (customers, orders, products, subscriptions) plus ML prediction tables (churn/forecast).

	Not for

	* Direct execution on sensitive or production databases without validation (schema checks, allow-lists, sandbox execution).
	* Security-critical contexts (SQL injection prevention and access control must be handled outside the model).

	## Training summary

	\| Item \| Value \|
	\| --- \| --- \|
	\| Base model \| Qwen/Qwen2.5-3B-Instruct \|
	\| Fine-tuning method \| QLoRA (4-bit) \|
	\| Optimizer \| paged\_adamw\_8bit \|
	\| Epochs \| 4 \|
	\| Training time \| ~4 minutes (A100) \|
	\| Trainable params \| 29.9M (1.73% of 3B total) \|
	\| Decoding \| Greedy \|
	\| Tracking \| MLflow (DagsHub) \|

	## Evaluation summary (100 test examples)

	Primary metric: parseable PostgreSQL SQL (validated with `sqlglot`).
	Secondary metric: exact match (strict string match vs. reference SQL).

	\| Model \| Parseable SQL \| Exact match \| Mean latency (s) \| P50 (s) \| P95 (s) \|
	\| --- \| --- \| --- \| --- \| --- \| --- \|
	\| qwen\_finetuned\_fp16\_strict \| 1.00 \| 0.15 \| 0.433 \| 0.427 \| 0.736 \|
	\| qwen\_finetuned\_int8\_strict \| 0.99 \| 0.20 \| 2.152 \| 2.541 \| 3.610 \|
	\| qwen\_baseline\_fp16 \| 1.00 \| 0.09 \| 0.405 \| 0.422 \| 0.624 \|
	\| qwen\_finetuned\_fp16 \| 0.93 \| 0.13 \| 0.527 \| 0.711 \| 0.739 \|
	\| qwen\_finetuned\_int8 \| 0.93 \| 0.13 \| 2.672 \| 3.454 \| 3.623 \|
	\| gpt-4o-mini \| 1.00 \| 0.04 \| 1.616 \| 1.551 \| 2.820 \|
	\| claude-3.5-haiku \| 0.99 \| 0.07 \| 1.735 \| 1.541 \| 2.697 \|

	Key Findings:

	* Strict prompting is critical: Adding "Return ONLY the PostgreSQL query. Do NOT include explanations, markdown, or commentary" improved parseable rate from 93% to 100%
	* Fine-tuning improves accuracy: Exact match increased from 9% (baseline) to 15% (fine-tuned), a 67% improvement
	* Quantization trade-offs: INT8 maintains accuracy (20% exact match, best across all models) with 50% memory reduction but shows 5x latency increase
	* Competitive with APIs: Fine-tuned model achieves 4x better exact match than GPT-4o-mini while maintaining comparable speed

	## Results Visualization

	![Model Comparison](https://github.com/aravula7/qwen-sql-finetuning/raw/main/images/results_comparison.png)

	Parseable SQL rate and exact match accuracy comparison across all 7 models.

	## How to load

	### Load the merged FP16 model (recommended)

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	repo_id = "aravula7/qwen-sql-finetuning"

	tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="fp16")
	model = AutoModelForCausalLM.from_pretrained(
	repo_id,
	subfolder="fp16",
	torch_dtype=torch.float16,
	device_map="auto"
	)
	```

	### Load the INT8 model

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	repo_id = "aravula7/qwen-sql-finetuning"

	tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="int8")
	model = AutoModelForCausalLM.from_pretrained(
	repo_id,
	subfolder="int8",
	device_map="auto"
	)
	```

	### Load base model + LoRA adapter

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel
	import torch

	base_id = "Qwen/Qwen2.5-3B-Instruct"
	repo_id = "aravula7/qwen-sql-finetuning"

	tokenizer = AutoTokenizer.from_pretrained(base_id)
	base = AutoModelForCausalLM.from_pretrained(
	base_id,
	torch_dtype=torch.float16,
	device_map="auto"
	)

	model = PeftModel.from_pretrained(base, repo_id, subfolder="lora_adapter")
	```

	## Example inference

	Below is a minimal example that encourages SQL-only output (critical for 100% parseability).

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer

	repo_id = "aravula7/qwen-sql-finetuning"
	tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="fp16")
	model = AutoModelForCausalLM.from_pretrained(
	repo_id,
	subfolder="fp16",
	torch_dtype=torch.float16,
	device_map="auto"
	)

	system = "Return ONLY the PostgreSQL query. Do NOT include explanations, markdown, code fences, or commentary."
	schema = "Table: customers (customer_id, email, state)\nTable: orders (order_id, customer_id, order_timestamp)"
	request = "Show the number of orders per customer in 2025."

	prompt = f"""{system}

	Schema:
	{schema}

	Request:
	{request}
	"""

	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
	with torch.no_grad():
	out = model.generate(
	**inputs,
	max_new_tokens=256,
	do_sample=False,
	pad_token_id=tokenizer.eos_token_id
	)

	sql = tokenizer.decode(out[0], skip_special_tokens=True)
	# Extract SQL after prompt
	sql = sql.split("Request:")[-1].strip()
	print(sql)
	```

	## License

	This project is licensed under the MIT License. The fine-tuned model is a derivative of Qwen2.5-3B-Instruct and inherits its license terms.

	Full documentation and code: [GitHub Repository](https://github.com/aravula7/qwen-sql-finetuning)

	## Reproducibility

	Training and evaluation were tracked with MLflow on DagsHub. The GitHub repository contains:

	* Complete Colab notebook with training and evaluation code
	* Dataset (500 examples: 350 train, 50 val, 100 test)
	* Visualization scripts for 3D performance analysis
	* Production-ready inference code with error handling

	Links:
	* [GitHub Repository](https://github.com/aravula7/qwen-sql-finetuning)
	* [MLflow Experiments](https://dagshub.com/aravula7/llm-finetuning)
	* [Base Model](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)

	## Citation

	```bibtex
	@misc{qwen-sql-finetuning-2026,
	author = {Anirudh Reddy Ravula},
	title = {Qwen2.5-3B Text-to-SQL Fine-Tuning for PostgreSQL},
	year = {2026},
	publisher = {HuggingFace},
	howpublished = {\url{https://huggingface.co/aravula7/qwen-sql-finetuning}},
	note = {Fine-tuned with QLoRA for e-commerce SQL generation}
	}
	```