nnul/sqlchat: A Conversational AI for SQL Generation
This repository contains sqlchat, a powerful and efficient language model designed specifically for Text-to-SQL tasks. It can understand natural language questions and database schemas to generate accurate SQL queries, including complex statements for creating and managing tables (Data Definition Language).
This model is provided as a standalone 4-bit quantized model, optimized for easy deployment and high-performance, low-resource inference. It was built using the Unsloth library to ensure maximum speed and memory efficiency.
Model Capabilities
- Natural Language to SQL: Translates complex English questions into executable SQL queries.
- Schema-Aware: Understands
CREATE TABLEcontexts provided in the prompt to generate relevant queries. - DDL Generation: Capable of generating
CREATE TABLEstatements, including constraints likePRIMARY KEYandFOREIGN KEYrelationships. - Complex Query Logic: Successfully handles
JOINs, aggregations (COUNT,MAX), and sorting (ORDER BY ... LIMIT).
How to Use
The easiest way to use sqlchat is with the Unsloth library, which will ensure you get the best performance.
Prerequisites
First, install the necessary libraries.
pip install unsloth
pip install "torch>=2.3.1"
Running Inference
Here is a simple, reusable Python script to run inference with the model.
import torch
from unsloth import FastLanguageModel
from transformers import TextStreamer
# Load the sqlchat model from the Hugging Face Hub
# This is a standalone 4-bit model, so we load it as such.
print("Loading sqlchat model...")
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="nnul/sqlchat",
max_seq_length=4096,
dtype=None,
load_in_4bit=True,
)
print("Model loaded successfully.")
# This call optimizes the model for the fastest possible inference.
FastLanguageModel.for_inference(model)
def generate_sql(instruction: str, context: str = ""):
"""
A helper function to generate SQL from a natural language prompt.
"""
prompt = tokenizer.apply_chat_template(
[
{"role": "system", "content": "You are a helpful assistant that generates SQL queries based on natural language questions and database schemas."},
{"role": "user", "content": f"### Instruction:\n{instruction}\n\n### Context:\n{context}"},
],
tokenize=False,
add_generation_prompt=True,
enable_thinking=False, # Ensures direct SQL output
)
inputs = tokenizer([prompt], return_tensors="pt").to("cuda")
text_streamer = TextStreamer(tokenizer, skip_prompt=True, clean_up_tokenization_spaces=True)
print(f"User Instruction: {instruction}")
print("\nModel Output:")
print("---------------------------------")
_ = model.generate(
**inputs,
streamer=text_streamer,
max_new_tokens=256,
do_sample=False, # Use greedy decoding for deterministic output
use_cache=True,
)
print("---------------------------------\n")
# --- Example 1: Querying Data ---
generate_sql(
instruction="Which department has the most number of employees?",
context="CREATE TABLE department (name VARCHAR, num_employees INTEGER)"
)
# --- Example 2: Creating a Table (DDL) ---
generate_sql(
instruction="We need a table to manage student enrollments in courses. This table should link the 'students' table and the 'courses' table using their respective IDs.",
context="""
CREATE TABLE students (student_id INTEGER PRIMARY KEY, student_name VARCHAR(255));
CREATE TABLE courses (course_id INTEGER PRIMARY KEY, course_title VARCHAR(255));
"""
)
Expected Output
User Instruction: Which department has the most number of employees?
Model Output:
---------------------------------
SELECT name FROM department ORDER BY num_employees DESC LIMIT 1;
---------------------------------
User Instruction: We need a table to manage student enrollments in courses. This table should link the 'students' table and the 'courses' table using their respective IDs.
Model Output:
---------------------------------
CREATE TABLE student_enrollment (student_id INTEGER, course_id INTEGER, PRIMARY KEY (student_id, course_id), FOREIGN KEY (student_id) REFERENCES students(student_id), FOREIGN KEY (course_id) REFERENCES courses(course_id));
---------------------------------
Performance
The model was benchmarked on an NVIDIA A40 GPU. In a batch-processing scenario, it achieves a throughput of ~55-70 tokens/second. Single-prompt latency is well within real-time requirements for interactive applications.
- Peak VRAM Usage (Inference): ~6.1 GB
Prompt Template
To get the best results, your prompts should follow this structure:
<|im_start|>system
You are a helpful assistant that generates SQL queries based on natural language questions and database schemas.<|im_end|>
<|im_start|>user
### Instruction:
{Your natural language question}
### Context:
{The CREATE TABLE statements for the relevant tables}<|im_end|>
<|im_start|>assistant
- Downloads last month
- 9