Instructions to use manuelaschrittwieser/Qwen2.5-SQL-Assistant-Full with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use manuelaschrittwieser/Qwen2.5-SQL-Assistant-Full with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="manuelaschrittwieser/Qwen2.5-SQL-Assistant-Full") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("manuelaschrittwieser/Qwen2.5-SQL-Assistant-Full") model = AutoModelForCausalLM.from_pretrained("manuelaschrittwieser/Qwen2.5-SQL-Assistant-Full") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use manuelaschrittwieser/Qwen2.5-SQL-Assistant-Full with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "manuelaschrittwieser/Qwen2.5-SQL-Assistant-Full" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "manuelaschrittwieser/Qwen2.5-SQL-Assistant-Full", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/manuelaschrittwieser/Qwen2.5-SQL-Assistant-Full
- SGLang
How to use manuelaschrittwieser/Qwen2.5-SQL-Assistant-Full with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "manuelaschrittwieser/Qwen2.5-SQL-Assistant-Full" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "manuelaschrittwieser/Qwen2.5-SQL-Assistant-Full", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "manuelaschrittwieser/Qwen2.5-SQL-Assistant-Full" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "manuelaschrittwieser/Qwen2.5-SQL-Assistant-Full", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use manuelaschrittwieser/Qwen2.5-SQL-Assistant-Full with Docker Model Runner:
docker model run hf.co/manuelaschrittwieser/Qwen2.5-SQL-Assistant-Full
Qwen2.5-1.5B-SQL-Assistant-Full (Merged)
📖 Model Overview
Qwen2.5-SQL-Assistant-Full is a standalone fine-tuned Language Model optimized for Text-to-SQL generation.
This model represents the merged version of the SQL-Assistant-Prod adapter.
The LoRA adapters have been permanently folded into the base model weights,
meaning this model can be loaded directly with transformers, vLLM, TGI, or converted to GGUF for local use (Ollama) without requiring PEFT dependencies.
Key Features
- Architecture: Qwen 2.5 (1.5 Billion Parameters).
- Specialization: Strictly generates SQL queries based on provided database schemas.
- Deployment: Ready for high-performance inference servers (vLLM, Groq, Together AI) as a standard model.
- Efficiency: Extremely lightweight (requires < 4GB VRAM in FP16), making it suitable for edge devices and CPU-only environments.
💻 How to Use
Because this is a merged model, usage is standard and simple. You do not need peft.
Using Transformers
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
# 1. Load the Model (Standard Loading)
model_id = "manuelaschrittwieser/Qwen2.5-SQL-Assistant-Full"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype=torch.float16 # or float32 for CPU
)
# 2. Define Context & Question
schema = "CREATE TABLE employees (id INT, name VARCHAR, dept VARCHAR, salary INT)"
question = "Show me the top 3 earners in the Sales department."
# 3. Format Input (Chat Template)
messages = [
{"role": "system", "content": "You are a SQL expert."},
{"role": "user", "content": f"{schema}\nQuestion: {question}"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
# 4. Generate
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=150)
# 5. Output
print(tokenizer.decode(outputs[0], skip_special_tokens=True).split("assistant")[-1].strip())
📊 Performance & Evaluation
The model was evaluated using Normalized Exact Match Accuracy against a hold-out test set from the b-mc2/sql-create-context dataset.
| Metric | Score | Notes |
|---|---|---|
| Exact Match | ~78% | High fidelity to schema constraints. |
| Hallucination | < 1% | Rarely invents columns not present in the CREATE TABLE context. |
| Format | 100% | Consistently outputs raw SQL without conversational filler. |
🛠️ Training Details
- Original Base Model:
Qwen/Qwen2.5-1.5B-Instruct - Fine-Tuning Method: QLoRA (Rank 16, Alpha 16).
- Merge Method:
merge_and_unload()via PEFT. - Precision: The merged weights are saved in standard precision (FP32/FP16), allowing for further quantization (e.g., AWQ, GPTQ, GGUF) if desired.
⚠️ Limitations & Bias
- Context Required: The model is optimized for Context-Dependent SQL generation. It relies on receiving a valid
CREATE TABLEstatement in the prompt to function correctly. - Read-Only Focus:* While it can generate
INSERT/UPDATEqueries, it is primarily optimized for data retrieval (SELECT). - Safety: Always validate and sanitize SQL queries generated by LLMs before executing them on production databases to prevent SQL injection risks.
📜 License
This project is licensed under the MIT License.
- Downloads last month
- -