SQL Generator โ Qwen2.5-Coder-7B (QLoRA)
Fine-tuned Qwen2.5-Coder-7B-Instruct for text-to-SQL generation. Given a SQL schema and a natural-language question, the model produces a syntactically correct SQL query.
Trained as part of the SQL Agent LLMOps project โ a multi-model SQL agent with deployment on HuggingFace Spaces.
Model details
| Base model | unsloth/Qwen2.5-Coder-7B-Instruct-bnb-4bit |
| Architecture | Qwen2 (7.6B params, 4-bit quantized base) |
| Fine-tuning method | QLoRA via Unsloth + TRL |
| LoRA rank | 16 |
| LoRA alpha | 32 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Trainable params | |
| Language | English |
| License | Apache 2.0 |
Training data
DanielRegaladoCardoso/text-to-sql-mix-v2 โ a curated mix of 5 public text-to-SQL datasets:
b-mc2/sql-create-contextgretelai/synthetic_text_to_sqlknowrohit07/know_sqlNumbersStation/NSText2SQLClinton/Text-to-sql-v1
Final training set: 672,949 examples (after filtering sequences > 1024 tokens โ kept 93.1% of original 723,097 rows). With sequence packing, this compressed to 154,462 effective sequences of length 1024.
Training configuration
| Hyperparameter | Value |
|---|---|
| Hardware | 1ร NVIDIA L40S (48 GB) |
| Epochs | 1 |
| Batch size (per device) | 16 |
| Gradient accumulation | 1 |
| Effective batch size | 16 |
| Max sequence length | 1024 |
| Learning rate | 1e-4 |
| LR scheduler | Cosine |
| Warmup ratio | 0.03 |
| Optimizer | adamw_8bit |
| Precision | bf16 |
| Sequence packing | Enabled |
| Total steps | 9,654 |
| Wall-clock time | 13.5 hours |
| Final training loss | 0.2658 |
Prompt format
The model expects a chat-style prompt with a system message defining the SQL-expert role and a user message containing the schema and question:
<|im_start|>system
You are a SQL expert. Given a SQL schema and a natural-language question, generate a correct SQL query answering the question. Return only the SQL.
<|im_end|>
<|im_start|>user
### Schema
CREATE TABLE players (id INT, name VARCHAR, hometown VARCHAR);
### Question
List all players from Tampa, Florida.
<|im_end|>
<|im_start|>assistant
Usage
Option A โ Load merged 16-bit model (recommended)
from transformers import AutoModelForCausalLM, AutoTokenizer
REPO = "DanielRegaladoCardoso/sql-generator-qwen25-coder-7b-lora"
model = AutoModelForCausalLM.from_pretrained(REPO, torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(REPO)
messages = [
{"role": "system", "content": "You are a SQL expert. Given a SQL schema and a natural-language question, generate a correct SQL query answering the question. Return only the SQL."},
{"role": "user", "content": "### Schema\nCREATE TABLE players (id INT, name VARCHAR, hometown VARCHAR);\n\n### Question\nList all players from Tampa, Florida."},
]
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
out = model.generate(input_ids, max_new_tokens=256, do_sample=False)
print(tokenizer.decode(out[0][input_ids.shape[1]:], skip_special_tokens=True))
# โ SELECT * FROM players WHERE hometown = 'Tampa, Florida'
Option B โ Load LoRA adapter on top of base model
Useful if you want to keep the base model in 4-bit (lower VRAM footprint).
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
base = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2.5-Coder-7B-Instruct",
torch_dtype=torch.bfloat16,
device_map="auto",
)
model = PeftModel.from_pretrained(base, "DanielRegaladoCardoso/sql-generator-qwen25-coder-7b-lora")
tokenizer = AutoTokenizer.from_pretrained("DanielRegaladoCardoso/sql-generator-qwen25-coder-7b-lora")
Option C โ Inference with Unsloth (fastest)
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
"DanielRegaladoCardoso/sql-generator-qwen25-coder-7b-lora",
max_seq_length=1024,
load_in_4bit=True,
)
FastLanguageModel.for_inference(model)
Files
| File | Size | Purpose |
|---|---|---|
adapter_model.safetensors |
161 MB | LoRA adapter weights |
adapter_config.json |
1 KB | LoRA configuration |
model-0000{1..4}-of-00004.safetensors |
15.2 GB | Merged 16-bit model |
tokenizer.json + tokenizer_config.json |
11 MB | Tokenizer |
chat_template.jinja |
4 KB | Qwen chat template |
Limitations
- English only โ training data is English; performance on other languages is not validated.
- Sequence length cap โ examples requiring > 1024 tokens (large schemas, complex multi-CTE queries) were filtered out during training. The model may underperform on inputs above this length.
- No execution validation โ the model is trained to produce syntactically correct SQL, but generated queries are not guaranteed to execute or return correct results without manual review. Always sanity-check against your real database.
- Single dialect bias โ training data mixes multiple SQL dialects (SQLite, ANSI, MySQL); the model may produce queries that lean toward one dialect over another.
Citation
If you use this model, please cite the SQL Agent LLMOps project.
@misc{regalado2026sqlagent,
author = {Daniel Regalado Cardoso},
title = {SQL Generator: Qwen2.5-Coder-7B fine-tuned for text-to-SQL},
year = {2026},
howpublished = {\url{https://huggingface.co/DanielRegaladoCardoso/sql-generator-qwen25-coder-7b-lora}},
}
Acknowledgments
Model tree for DanielRegaladoCardoso/sql-generator-qwen25-coder-7b-lora
Dataset used to train DanielRegaladoCardoso/sql-generator-qwen25-coder-7b-lora
Space using DanielRegaladoCardoso/sql-generator-qwen25-coder-7b-lora 1
Evaluation results
- Final training loss on text-to-sql-mix-v2self-reported0.266
