SQL Generator โ€” Qwen2.5-Coder-7B (QLoRA)

Fine-tuned Qwen2.5-Coder-7B-Instruct for text-to-SQL generation. Given a SQL schema and a natural-language question, the model produces a syntactically correct SQL query.

Trained as part of the SQL Agent LLMOps project โ€” a multi-model SQL agent with deployment on HuggingFace Spaces.

Model details

Base model unsloth/Qwen2.5-Coder-7B-Instruct-bnb-4bit
Architecture Qwen2 (7.6B params, 4-bit quantized base)
Fine-tuning method QLoRA via Unsloth + TRL
LoRA rank 16
LoRA alpha 32
Target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Trainable params 70 M (0.9% of base)
Language English
License Apache 2.0

Training data

DanielRegaladoCardoso/text-to-sql-mix-v2 โ€” a curated mix of 5 public text-to-SQL datasets:

  • b-mc2/sql-create-context
  • gretelai/synthetic_text_to_sql
  • knowrohit07/know_sql
  • NumbersStation/NSText2SQL
  • Clinton/Text-to-sql-v1

Final training set: 672,949 examples (after filtering sequences > 1024 tokens โ€” kept 93.1% of original 723,097 rows). With sequence packing, this compressed to 154,462 effective sequences of length 1024.

Training configuration

Hyperparameter Value
Hardware 1ร— NVIDIA L40S (48 GB)
Epochs 1
Batch size (per device) 16
Gradient accumulation 1
Effective batch size 16
Max sequence length 1024
Learning rate 1e-4
LR scheduler Cosine
Warmup ratio 0.03
Optimizer adamw_8bit
Precision bf16
Sequence packing Enabled
Total steps 9,654
Wall-clock time 13.5 hours
Final training loss 0.2658

Prompt format

The model expects a chat-style prompt with a system message defining the SQL-expert role and a user message containing the schema and question:

<|im_start|>system
You are a SQL expert. Given a SQL schema and a natural-language question, generate a correct SQL query answering the question. Return only the SQL.
<|im_end|>
<|im_start|>user
### Schema
CREATE TABLE players (id INT, name VARCHAR, hometown VARCHAR);

### Question
List all players from Tampa, Florida.
<|im_end|>
<|im_start|>assistant

Usage

Option A โ€” Load merged 16-bit model (recommended)

from transformers import AutoModelForCausalLM, AutoTokenizer

REPO = "DanielRegaladoCardoso/sql-generator-qwen25-coder-7b-lora"

model = AutoModelForCausalLM.from_pretrained(REPO, torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(REPO)

messages = [
    {"role": "system", "content": "You are a SQL expert. Given a SQL schema and a natural-language question, generate a correct SQL query answering the question. Return only the SQL."},
    {"role": "user", "content": "### Schema\nCREATE TABLE players (id INT, name VARCHAR, hometown VARCHAR);\n\n### Question\nList all players from Tampa, Florida."},
]

input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
out = model.generate(input_ids, max_new_tokens=256, do_sample=False)
print(tokenizer.decode(out[0][input_ids.shape[1]:], skip_special_tokens=True))
# โ†’ SELECT * FROM players WHERE hometown = 'Tampa, Florida'

Option B โ€” Load LoRA adapter on top of base model

Useful if you want to keep the base model in 4-bit (lower VRAM footprint).

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

base = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-Coder-7B-Instruct",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model = PeftModel.from_pretrained(base, "DanielRegaladoCardoso/sql-generator-qwen25-coder-7b-lora")
tokenizer = AutoTokenizer.from_pretrained("DanielRegaladoCardoso/sql-generator-qwen25-coder-7b-lora")

Option C โ€” Inference with Unsloth (fastest)

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    "DanielRegaladoCardoso/sql-generator-qwen25-coder-7b-lora",
    max_seq_length=1024,
    load_in_4bit=True,
)
FastLanguageModel.for_inference(model)

Files

File Size Purpose
adapter_model.safetensors 161 MB LoRA adapter weights
adapter_config.json 1 KB LoRA configuration
model-0000{1..4}-of-00004.safetensors 15.2 GB Merged 16-bit model
tokenizer.json + tokenizer_config.json 11 MB Tokenizer
chat_template.jinja 4 KB Qwen chat template

Limitations

  • English only โ€” training data is English; performance on other languages is not validated.
  • Sequence length cap โ€” examples requiring > 1024 tokens (large schemas, complex multi-CTE queries) were filtered out during training. The model may underperform on inputs above this length.
  • No execution validation โ€” the model is trained to produce syntactically correct SQL, but generated queries are not guaranteed to execute or return correct results without manual review. Always sanity-check against your real database.
  • Single dialect bias โ€” training data mixes multiple SQL dialects (SQLite, ANSI, MySQL); the model may produce queries that lean toward one dialect over another.

Citation

If you use this model, please cite the SQL Agent LLMOps project.

@misc{regalado2026sqlagent,
  author       = {Daniel Regalado Cardoso},
  title        = {SQL Generator: Qwen2.5-Coder-7B fine-tuned for text-to-SQL},
  year         = {2026},
  howpublished = {\url{https://huggingface.co/DanielRegaladoCardoso/sql-generator-qwen25-coder-7b-lora}},
}

Acknowledgments

  • Unsloth โ€” 2ร— faster QLoRA training
  • TRL โ€” SFTTrainer
  • Qwen team โ€” Qwen2.5-Coder-7B base model
  • All authors of the source datasets cited above

Downloads last month

-

Downloads are not tracked for this model. How to track
Safetensors
Model size
8B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for DanielRegaladoCardoso/sql-generator-qwen25-coder-7b-lora

Dataset used to train DanielRegaladoCardoso/sql-generator-qwen25-coder-7b-lora

Space using DanielRegaladoCardoso/sql-generator-qwen25-coder-7b-lora 1

Evaluation results