Instructions to use aravula7/qwen-sql-finetuning with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use aravula7/qwen-sql-finetuning with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="aravula7/qwen-sql-finetuning")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("aravula7/qwen-sql-finetuning", dtype="auto")

PEFT
How to use aravula7/qwen-sql-finetuning with PEFT:
```
Task type is invalid.
```
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use aravula7/qwen-sql-finetuning with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "aravula7/qwen-sql-finetuning"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "aravula7/qwen-sql-finetuning",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/aravula7/qwen-sql-finetuning

SGLang

How to use aravula7/qwen-sql-finetuning with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "aravula7/qwen-sql-finetuning" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "aravula7/qwen-sql-finetuning",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "aravula7/qwen-sql-finetuning" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "aravula7/qwen-sql-finetuning",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use aravula7/qwen-sql-finetuning with Docker Model Runner:
```
docker model run hf.co/aravula7/qwen-sql-finetuning
```

aravula7 commited on Feb 9

Commit

042ca6f

verified ·

1 Parent(s): 04995b2

Update README.md

Browse files

Files changed (1) hide show

README.md +94 -30

README.md CHANGED Viewed

@@ -12,6 +12,8 @@ tags:
 - quantization
 base_model: Qwen/Qwen2.5-3B-Instruct
 license: mit
 ---
 # Qwen2.5-3B Text-to-SQL (PostgreSQL) — Fine-Tuned
@@ -22,30 +24,32 @@ This repository contains a fine-tuned **Qwen/Qwen2.5-3B-Instruct** model special
 Artifacts are organized under a single Hub repo using subfolders:
-- `fp16/` — merged FP16 model (recommended)
-- `int8/` — quantized INT8 checkpoint (smaller footprint)
-- `lora_adapter/` — LoRA adapter only (for further tuning / research)
 ## Intended use
 **Use cases**
-- Convert natural language questions into PostgreSQL queries.
-- Analytical queries over common e-commerce tables (customers, orders, products, subscriptions) plus ML prediction tables (churn/forecast).
 **Not for**
-- Direct execution on sensitive or production databases without validation (schema checks, allow-lists, sandbox execution).
-- Security-critical contexts (SQL injection prevention and access control must be handled outside the model).
 ## Training summary
 | Item | Value |
-|---|---|
 | Base model | Qwen/Qwen2.5-3B-Instruct |
 | Fine-tuning method | QLoRA (4-bit) |
-| Optimizer | paged_adamw_8bit |
 | Epochs | 4 |
 | Decoding | Greedy |
 | Tracking | MLflow (DagsHub) |
@@ -55,18 +59,27 @@ Primary metric: **parseable PostgreSQL SQL** (validated with `sqlglot`).
 Secondary metric: **exact match** (strict string match vs. reference SQL).
 | Model | Parseable SQL | Exact match | Mean latency (s) | P50 (s) | P95 (s) |
-|---|---:|---:|---:|---:|---:|
-| qwen_baseline_fp16 | 1.00 | 0.09 | 0.405 | 0.422 | 0.624 |
-| qwen_finetuned_fp16 | 0.93 | 0.13 | 0.527 | 0.711 | 0.739 |
-| qwen_finetuned_int8 | 0.93 | 0.13 | 2.672 | 3.454 | 3.623 |
-| qwen_finetuned_fp16_strict | 1.00 | 0.15 | 0.433 | 0.427 | 0.736 |
-| qwen_finetuned_int8_strict | 0.99 | 0.20 | 2.152 | 2.541 | 3.610 |
 | gpt-4o-mini | 1.00 | 0.04 | 1.616 | 1.551 | 2.820 |
 | claude-3.5-haiku | 0.99 | 0.07 | 1.735 | 1.541 | 2.697 |
-Notes:
-- The “strict” variants used a stricter system instruction to return **SQL only** (no prose, no markdown), which improved reliability.
-- INT8 reduced memory usage but was slower in this specific GPU evaluation setup.
 ## How to load
@@ -78,7 +91,12 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
 repo_id = "aravula7/qwen-sql-finetuning"
 tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="fp16")
-model = AutoModelForCausalLM.from_pretrained(repo_id, subfolder="fp16")
 ```
 ### Load the INT8 model
@@ -89,7 +107,11 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
 repo_id = "aravula7/qwen-sql-finetuning"
 tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="int8")
-model = AutoModelForCausalLM.from_pretrained(repo_id, subfolder="int8")
 ```
 ### Load base model + LoRA adapter
@@ -97,19 +119,24 @@ model = AutoModelForCausalLM.from_pretrained(repo_id, subfolder="int8")
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 from peft import PeftModel
 base_id = "Qwen/Qwen2.5-3B-Instruct"
 repo_id = "aravula7/qwen-sql-finetuning"
 tokenizer = AutoTokenizer.from_pretrained(base_id)
-base = AutoModelForCausalLM.from_pretrained(base_id)
 model = PeftModel.from_pretrained(base, repo_id, subfolder="lora_adapter")
 ```
 ## Example inference
-Below is a minimal example that encourages **SQL-only** output.
 ```python
 import torch
@@ -117,7 +144,12 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
 repo_id = "aravula7/qwen-sql-finetuning"
 tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="fp16")
-model = AutoModelForCausalLM.from_pretrained(repo_id, subfolder="fp16")
 system = "Return ONLY the PostgreSQL query. Do NOT include explanations, markdown, code fences, or commentary."
 schema = "Table: customers (customer_id, email, state)\nTable: orders (order_id, customer_id, order_timestamp)"
@@ -132,18 +164,50 @@ Request:
 {request}
 """
-inputs = tokenizer(prompt, return_tensors="pt")
 with torch.no_grad():
-    out = model.generate(**inputs, max_new_tokens=256, do_sample=False)
-print(tokenizer.decode(out[0], skip_special_tokens=True))
 ```
 ## License
-This repository is a fine-tuned derivative of the base model listed in the metadata. Please follow the licensing terms of the base model and any dataset constraints used for training. Available at Github.
-GitHub: https://www.github.com/aravula7/qwen-sql-finetuning/
 ## Reproducibility
-Training and evaluation were tracked with MLflow on DagsHub. The associated GitHub/DagsHub repository contains the notebook, data splits, and logged runs.

 - quantization
 base_model: Qwen/Qwen2.5-3B-Instruct
 license: mit
+metrics:
+- accuracy
 ---
 # Qwen2.5-3B Text-to-SQL (PostgreSQL) — Fine-Tuned
 Artifacts are organized under a single Hub repo using subfolders:
+* `fp16/` — merged FP16 model (recommended)
+* `int8/` — quantized INT8 checkpoint (smaller footprint)
+* `lora_adapter/` — LoRA adapter only (for further tuning / research)
 ## Intended use
 **Use cases**
+* Convert natural language questions into PostgreSQL queries.
+* Analytical queries over common e-commerce tables (customers, orders, products, subscriptions) plus ML prediction tables (churn/forecast).
 **Not for**
+* Direct execution on sensitive or production databases without validation (schema checks, allow-lists, sandbox execution).
+* Security-critical contexts (SQL injection prevention and access control must be handled outside the model).
 ## Training summary
 | Item | Value |
+| --- | --- |
 | Base model | Qwen/Qwen2.5-3B-Instruct |
 | Fine-tuning method | QLoRA (4-bit) |
+| Optimizer | paged\_adamw\_8bit |
 | Epochs | 4 |
+| Training time | ~4 minutes (A100) |
+| Trainable params | 29.9M (1.73% of 3B total) |
 | Decoding | Greedy |
 | Tracking | MLflow (DagsHub) |
 Secondary metric: **exact match** (strict string match vs. reference SQL).
 | Model | Parseable SQL | Exact match | Mean latency (s) | P50 (s) | P95 (s) |
+| --- | --- | --- | --- | --- | --- |
+| **qwen\_finetuned\_fp16\_strict** | **1.00** | **0.15** | **0.433** | 0.427 | 0.736 |
+| qwen\_finetuned\_int8\_strict | 0.99 | 0.20 | 2.152 | 2.541 | 3.610 |
+| qwen\_baseline\_fp16 | 1.00 | 0.09 | 0.405 | 0.422 | 0.624 |
+| qwen\_finetuned\_fp16 | 0.93 | 0.13 | 0.527 | 0.711 | 0.739 |
+| qwen\_finetuned\_int8 | 0.93 | 0.13 | 2.672 | 3.454 | 3.623 |
 | gpt-4o-mini | 1.00 | 0.04 | 1.616 | 1.551 | 2.820 |
 | claude-3.5-haiku | 0.99 | 0.07 | 1.735 | 1.541 | 2.697 |
+**Key Findings:**
+* **Strict prompting is critical**: Adding "Return ONLY the PostgreSQL query. Do NOT include explanations, markdown, or commentary" improved parseable rate from 93% to 100%
+* **Fine-tuning improves accuracy**: Exact match increased from 9% (baseline) to 15% (fine-tuned), a **67% improvement**
+* **Quantization trade-offs**: INT8 maintains accuracy (20% exact match, best across all models) with 50% memory reduction but shows 5x latency increase
+* **Competitive with APIs**: Fine-tuned model achieves **4x better exact match** than GPT-4o-mini while maintaining comparable speed
+## Results Visualization
+![Model Comparison](https://github.com/aravula7/qwen-sql-finetuning/raw/main/images/results_comparison.png)
+*Parseable SQL rate and exact match accuracy comparison across all 7 models.*
 ## How to load
 repo_id = "aravula7/qwen-sql-finetuning"
 tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="fp16")
+model = AutoModelForCausalLM.from_pretrained(
+    repo_id,
+    subfolder="fp16",
+    torch_dtype=torch.float16,
+    device_map="auto"
+)
 ```
 ### Load the INT8 model
 repo_id = "aravula7/qwen-sql-finetuning"
 tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="int8")
+model = AutoModelForCausalLM.from_pretrained(
+    repo_id,
+    subfolder="int8",
+    device_map="auto"
+)
 ```
 ### Load base model + LoRA adapter
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 from peft import PeftModel
+import torch
 base_id = "Qwen/Qwen2.5-3B-Instruct"
 repo_id = "aravula7/qwen-sql-finetuning"
 tokenizer = AutoTokenizer.from_pretrained(base_id)
+base = AutoModelForCausalLM.from_pretrained(
+    base_id,
+    torch_dtype=torch.float16,
+    device_map="auto"
+)
 model = PeftModel.from_pretrained(base, repo_id, subfolder="lora_adapter")
 ```
 ## Example inference
+Below is a minimal example that encourages **SQL-only** output (critical for 100% parseability).
 ```python
 import torch
 repo_id = "aravula7/qwen-sql-finetuning"
 tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="fp16")
+model = AutoModelForCausalLM.from_pretrained(
+    repo_id,
+    subfolder="fp16",
+    torch_dtype=torch.float16,
+    device_map="auto"
+)
 system = "Return ONLY the PostgreSQL query. Do NOT include explanations, markdown, code fences, or commentary."
 schema = "Table: customers (customer_id, email, state)\nTable: orders (order_id, customer_id, order_timestamp)"
 {request}
 """
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
 with torch.no_grad():
+    out = model.generate(
+        **inputs,
+        max_new_tokens=256,
+        do_sample=False,
+        pad_token_id=tokenizer.eos_token_id
+    )
+sql = tokenizer.decode(out[0], skip_special_tokens=True)
+# Extract SQL after prompt
+sql = sql.split("Request:")[-1].strip()
+print(sql)
 ```
 ## License
+This project is licensed under the MIT License. The fine-tuned model is a derivative of Qwen2.5-3B-Instruct and inherits its license terms.
+**Full documentation and code:** [GitHub Repository](https://github.com/aravula7/qwen-sql-finetuning)
 ## Reproducibility
+Training and evaluation were tracked with MLflow on DagsHub. The GitHub repository contains:
+* Complete Colab notebook with training and evaluation code
+* Dataset (500 examples: 350 train, 50 val, 100 test)
+* Visualization scripts for 3D performance analysis
+* Production-ready inference code with error handling
+**Links:**
+* [GitHub Repository](https://github.com/aravula7/qwen-sql-finetuning)
+* [MLflow Experiments](https://dagshub.com/aravula7/llm-finetuning)
+* [Base Model](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)
+## Citation
+```bibtex
+@misc{qwen-sql-finetuning-2025,
+  author = {Anirudh Reddy Ravula},
+  title = {Qwen2.5-3B Text-to-SQL Fine-Tuning for PostgreSQL},
+  year = {2025},
+  publisher = {HuggingFace},
+  howpublished = {\url{https://huggingface.co/aravula7/qwen-sql-finetuning}},
+  note = {Fine-tuned with QLoRA for e-commerce SQL generation}
+}
+```