Instructions to use manuelaschrittwieser/Qwen2.5-SQL-Assistant-Full with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use manuelaschrittwieser/Qwen2.5-SQL-Assistant-Full with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="manuelaschrittwieser/Qwen2.5-SQL-Assistant-Full")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("manuelaschrittwieser/Qwen2.5-SQL-Assistant-Full")
model = AutoModelForCausalLM.from_pretrained("manuelaschrittwieser/Qwen2.5-SQL-Assistant-Full")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use manuelaschrittwieser/Qwen2.5-SQL-Assistant-Full with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "manuelaschrittwieser/Qwen2.5-SQL-Assistant-Full"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "manuelaschrittwieser/Qwen2.5-SQL-Assistant-Full",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/manuelaschrittwieser/Qwen2.5-SQL-Assistant-Full

SGLang

How to use manuelaschrittwieser/Qwen2.5-SQL-Assistant-Full with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "manuelaschrittwieser/Qwen2.5-SQL-Assistant-Full" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "manuelaschrittwieser/Qwen2.5-SQL-Assistant-Full",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "manuelaschrittwieser/Qwen2.5-SQL-Assistant-Full" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "manuelaschrittwieser/Qwen2.5-SQL-Assistant-Full",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use manuelaschrittwieser/Qwen2.5-SQL-Assistant-Full with Docker Model Runner:
```
docker model run hf.co/manuelaschrittwieser/Qwen2.5-SQL-Assistant-Full
```

Qwen2.5-1.5B-SQL-Assistant-Full (Merged)

📖 Model Overview

Qwen2.5-SQL-Assistant-Full is a standalone fine-tuned Language Model optimized for Text-to-SQL generation.

This model represents the merged version of the SQL-Assistant-Prod adapter. The LoRA adapters have been permanently folded into the base model weights, meaning this model can be loaded directly with transformers, vLLM, TGI, or converted to GGUF for local use (Ollama) without requiring PEFT dependencies.

Key Features

Architecture: Qwen 2.5 (1.5 Billion Parameters).
Specialization: Strictly generates SQL queries based on provided database schemas.
Deployment: Ready for high-performance inference servers (vLLM, Groq, Together AI) as a standard model.
Efficiency: Extremely lightweight (requires < 4GB VRAM in FP16), making it suitable for edge devices and CPU-only environments.

💻 How to Use

Because this is a merged model, usage is standard and simple. You do not need peft.

Using Transformers

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# 1. Load the Model (Standard Loading)
model_id = "manuelaschrittwieser/Qwen2.5-SQL-Assistant-Full"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.float16 # or float32 for CPU
)

# 2. Define Context & Question
schema = "CREATE TABLE employees (id INT, name VARCHAR, dept VARCHAR, salary INT)"
question = "Show me the top 3 earners in the Sales department."

# 3. Format Input (Chat Template)
messages = [
    {"role": "system", "content": "You are a SQL expert."},
    {"role": "user", "content": f"{schema}\nQuestion: {question}"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

# 4. Generate
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=150)

# 5. Output
print(tokenizer.decode(outputs[0], skip_special_tokens=True).split("assistant")[-1].strip())

📊 Performance & Evaluation

The model was evaluated using Normalized Exact Match Accuracy against a hold-out test set from the b-mc2/sql-create-context dataset.

Metric	Score	Notes
Exact Match	~78%	High fidelity to schema constraints.
Hallucination	< 1%	Rarely invents columns not present in the CREATE TABLE context.
Format	100%	Consistently outputs raw SQL without conversational filler.

🛠️ Training Details

Original Base Model: Qwen/Qwen2.5-1.5B-Instruct
Fine-Tuning Method: QLoRA (Rank 16, Alpha 16).
Merge Method: merge_and_unload() via PEFT.
Precision: The merged weights are saved in standard precision (FP32/FP16), allowing for further quantization (e.g., AWQ, GPTQ, GGUF) if desired.

⚠️ Limitations & Bias

Context Required: The model is optimized for Context-Dependent SQL generation. It relies on receiving a valid CREATE TABLE statement in the prompt to function correctly.
Read-Only Focus:* While it can generate INSERT/UPDATE queries, it is primarily optimized for data retrieval (SELECT).
Safety: Always validate and sanitize SQL queries generated by LLMs before executing them on production databases to prevent SQL injection risks.

📜 License

This project is licensed under the MIT License.

Downloads last month: 2

Safetensors

Model size

2B params

Tensor type

F16

Model tree for manuelaschrittwieser/Qwen2.5-SQL-Assistant-Full

Base model

Qwen/Qwen2.5-1.5B

Finetuned

Qwen/Qwen2.5-1.5B-Instruct