|
|
--- |
|
|
library_name: transformers |
|
|
license: apache-2.0 |
|
|
base_model: Qwen/Qwen3-4B |
|
|
tags: |
|
|
- text2sql |
|
|
- sql |
|
|
- nlp |
|
|
- distillation |
|
|
- qwen3 |
|
|
datasets: |
|
|
- distil-labs/text2sql-synthetic |
|
|
language: |
|
|
- en |
|
|
pipeline_tag: text-generation |
|
|
--- |
|
|
|
|
|
# Distil-Qwen3-4B-Text2SQL |
|
|
|
|
|
A fine-tuned Qwen3-4B model for converting natural language questions into SQL queries. Trained using knowledge distillation from DeepSeek-V3, this 4B parameter model matches teacher-level accuracy while being small enough to run locally. |
|
|
|
|
|
## Results |
|
|
|
|
|
| Metric | DeepSeek-V3 (Teacher) | Qwen3-4B (Base) | **This Model** | |
|
|
|--------|:---------------------:|:---------------:|:--------------:| |
|
|
| LLM-as-a-Judge | 80% | 62% | **80%** | |
|
|
| Exact Match | 48% | 16% | **60%** | |
|
|
| ROUGE | 87.6% | 84.2% | **89.5%** | |
|
|
| METEOR | 85.1% | 87.3% | 86.1% | |
|
|
|
|
|
The fine-tuned model **matches the 685B parameter teacher** on LLM-as-a-Judge accuracy and **exceeds it** on exact match and ROUGE scores. |
|
|
|
|
|
## Quick Start |
|
|
|
|
|
### Using Transformers |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
|
|
model = AutoModelForCausalLM.from_pretrained("distil-labs/distil-qwen3-4b-text2sql") |
|
|
tokenizer = AutoTokenizer.from_pretrained("distil-labs/distil-qwen3-4b-text2sql") |
|
|
|
|
|
schema = """CREATE TABLE employees ( |
|
|
id INTEGER PRIMARY KEY, |
|
|
name TEXT NOT NULL, |
|
|
department TEXT, |
|
|
salary INTEGER |
|
|
);""" |
|
|
|
|
|
question = "How many employees earn more than 50000?" |
|
|
|
|
|
messages = [ |
|
|
{ |
|
|
"role": "system", |
|
|
"content": """You are a problem solving model working on task_description XML block: |
|
|
<task_description>You are given a database schema and a natural language question. Generate the SQL query that answers the question. |
|
|
|
|
|
Input: |
|
|
- Schema: One or two table definitions in SQL DDL format |
|
|
- Question: Natural language question about the data |
|
|
|
|
|
Output: |
|
|
- A single SQL query that answers the question |
|
|
- No explanations, comments, or additional text |
|
|
|
|
|
Rules: |
|
|
- Use only tables and columns from the provided schema |
|
|
- Use uppercase SQL keywords (SELECT, FROM, WHERE, etc.) |
|
|
- Use SQLite-compatible syntax</task_description> |
|
|
You will be given a single task in the question XML block |
|
|
Solve only the task in question block. |
|
|
Generate only the answer, do not generate anything else""" |
|
|
}, |
|
|
{ |
|
|
"role": "user", |
|
|
"content": f"""Now for the real task, solve the task in question block. |
|
|
Generate only the solution, do not generate anything else |
|
|
<question>Schema: |
|
|
{schema} |
|
|
|
|
|
Question: {question}</question>""" |
|
|
} |
|
|
] |
|
|
|
|
|
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
|
|
inputs = tokenizer(text, return_tensors="pt") |
|
|
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0) |
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
### Using Ollama (GGUF version) |
|
|
|
|
|
For local inference, use the quantized GGUF versions: |
|
|
- [distil-qwen3-4b-text2sql-gguf](https://huggingface.co/distil-labs/distil-qwen3-4b-text2sql-gguf) - Full precision GGUF |
|
|
- [distil-qwen3-4b-text2sql-gguf-4bit](https://huggingface.co/distil-labs/distil-qwen3-4b-text2sql-gguf-4bit) - 4-bit quantized (~2.5GB) |
|
|
|
|
|
```bash |
|
|
# Download and create Ollama model |
|
|
ollama create distil-qwen3-4b-text2sql -f Modelfile |
|
|
|
|
|
# Run inference |
|
|
ollama run distil-qwen3-4b-text2sql |
|
|
``` |
|
|
|
|
|
## Model Details |
|
|
|
|
|
| Property | Value | |
|
|
|----------|-------| |
|
|
| Base Model | [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) | |
|
|
| Parameters | 4 billion | |
|
|
| Architecture | Qwen3ForCausalLM | |
|
|
| Context Length | 262,144 tokens | |
|
|
| Precision | bfloat16 | |
|
|
| Training Data | ~10,000 synthetic examples | |
|
|
| Teacher Model | DeepSeek-V3 | |
|
|
|
|
|
## Training |
|
|
|
|
|
This model was trained using the [Distil Labs](https://distillabs.ai) platform: |
|
|
|
|
|
1. **Seed Data**: 50 hand-validated Text2SQL examples covering various SQL complexities |
|
|
2. **Synthetic Generation**: Expanded to ~10,000 examples using DeepSeek-V3 |
|
|
3. **Fine-tuning**: 4 epochs on the synthetic dataset |
|
|
4. **Evaluation**: LLM-as-a-Judge with semantic equivalence checking |
|
|
|
|
|
### Training Hyperparameters |
|
|
|
|
|
- Epochs: 4 |
|
|
- Learning Rate: 5e-5 (cosine schedule) |
|
|
- Batch Size: 1 (with gradient accumulation) |
|
|
- Total Steps: ~40,000 |
|
|
|
|
|
## Task Format |
|
|
|
|
|
### Input Format |
|
|
|
|
|
``` |
|
|
Schema: |
|
|
CREATE TABLE table_name ( |
|
|
column_name DATA_TYPE [CONSTRAINTS], |
|
|
... |
|
|
); |
|
|
|
|
|
Question: Natural language question about the data |
|
|
``` |
|
|
|
|
|
### Output Format |
|
|
|
|
|
A single SQL query with: |
|
|
- Uppercase SQL keywords (SELECT, FROM, WHERE, etc.) |
|
|
- SQLite-compatible syntax |
|
|
- No explanations or additional text |
|
|
|
|
|
### Supported SQL Features |
|
|
|
|
|
- **Simple**: SELECT, WHERE, COUNT, SUM, AVG, MAX, MIN |
|
|
- **Medium**: JOIN, GROUP BY, HAVING, ORDER BY, LIMIT |
|
|
- **Complex**: Subqueries, multiple JOINs, UNION |
|
|
|
|
|
## Use Cases |
|
|
|
|
|
- Natural language interfaces to databases |
|
|
- SQL query assistance and autocompletion |
|
|
- Database chatbots and conversational BI |
|
|
- Educational tools for learning SQL |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- Optimized for SQLite syntax |
|
|
- Best with 1-2 table schemas |
|
|
- May struggle with highly complex nested subqueries |
|
|
- Trained on English questions only |
|
|
|
|
|
## License |
|
|
|
|
|
This model is released under the Apache 2.0 license. |
|
|
|
|
|
## Links |
|
|
|
|
|
- [Distil Labs Website](https://distillabs.ai) |
|
|
- [GitHub](https://github.com/distil-labs) |
|
|
- [Hugging Face](https://huggingface.co/distil-labs) |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{distil-qwen3-4b-text2sql, |
|
|
author = {Distil Labs}, |
|
|
title = {Distil-Qwen3-4B-Text2SQL: A Fine-tuned Model for Natural Language to SQL}, |
|
|
year = {2025}, |
|
|
publisher = {Hugging Face}, |
|
|
url = {https://huggingface.co/distil-labs/distil-qwen3-4b-text2sql} |
|
|
} |
|
|
``` |
|
|
|