chrisjcc's picture
Upload supervised-fine-tuned text-to-SQL model
9fcb518 verified
---
language:
- en
license: apache-2.0
library_name: peft
tags:
- text-to-sql
- sql-generation
- code-generation
- llama
- fine-tuned
- lora
- text2sql
- natural-language-to-sql
datasets:
- chrisjcc/text-to-sql-spider-dataset
base_model: meta-llama/Llama-3.1-8B-Instruct
pipeline_tag: text-generation
---
# Llama-3.1-8B-Instruct-text-to-sql-adapter
This is a **LoRA Adapter** fine-tuned from [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) for **text-to-SQL** generation tasks.
## πŸ“‹ Model Description
- **Base Model**: [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
- **Model Type**: LoRA Adapter
- **Fine-tuning Method**: QLoRA (4-bit quantization with LoRA adapters)
- **Training Dataset**: chrisjcc/text-to-sql-spider-dataset
- **Task**: Convert natural language questions into SQL queries
- **Language**: English
- **License**: apache-2.0
## 🎯 Intended Use
This model is designed to translate natural language questions into SQL queries for database interaction. It works best when provided with:
1. A database schema (CREATE TABLE statements)
2. A natural language question about the data
## πŸš€ Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from peft import PeftModel
# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-3.1-8B-Instruct",
device_map="auto",
torch_dtype="auto"
)
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "chrisjcc/Llama-3.1-8B-Instruct-text-to-sql-adapter")
tokenizer = AutoTokenizer.from_pretrained("chrisjcc/Llama-3.1-8B-Instruct-text-to-sql-adapter")
# For inference, merge adapter for better performance (optional)
model = model.merge_and_unload()
# Create text generation pipeline
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
max_new_tokens=256,
do_sample=False,
)
# Example usage
schema = """
CREATE TABLE users (
id INTEGER PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(100),
created_at TIMESTAMP
);
"""
question = "Show me all users who registered in the last 7 days"
messages = [
{
"role": "system",
"content": f"You are a text to SQL translator.\n\nSCHEMA:\n{schema}"
},
{"role": "user", "content": question}
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt)
sql_query = outputs[0]['generated_text'][len(prompt):].strip()
print("Generated SQL:", sql_query)
```
## βš™οΈ Training Configuration
### Model Architecture
- **LoRA Rank (r)**: 16
- **LoRA Alpha**: 32
- **LoRA Dropout**: 0.1
- **Target Modules**: All linear layers (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj)
- **Max Sequence Length**: 2048
### Training Hyperparameters
- **Number of Epochs**: 5
- **Per-Device Batch Size**: 1
- **Gradient Accumulation Steps**: 8
- **Effective Batch Size**: 8
- **Learning Rate**: 5e-05
- **Learning Rate Scheduler**: Constant
- **Optimizer**: AdamW (torch fused)
- **Weight Decay**: 0
- **Warmup Ratio**: 0.03
- **Max Gradient Norm**: 1.0
- **Precision**: bfloat16
### Training Infrastructure
- **Hardware**: NVIDIA GPU with bfloat16 support
- **Framework**: Transformers + PEFT + TRL
- **Gradient Checkpointing**: Enabled
- **Flash Attention**: Enabled
## πŸ“Š Training Details
The model was fine-tuned using Supervised Fine-Tuning (SFT) with the following approach:
1. **Dataset Format**: Chat template with system/user/assistant roles
2. **System Prompt**: Includes database schema for context
3. **User Prompt**: Natural language question
4. **Assistant Response**: SQL query
### Example Training Sample
```json
{
"messages": [
{
"role": "system",
"content": "You are a text to SQL translator...\n\nSCHEMA:\nCREATE TABLE..."
},
{
"role": "user",
"content": "Show me all customers from New York"
},
{
"role": "assistant",
"content": "SELECT * FROM customers WHERE city = 'New York';"
}
]
}
```
## πŸŽ“ Model Performance
The model has been trained to generate syntactically correct SQL queries for various database schemas. Performance may vary based on:
- Complexity of the database schema
- Ambiguity in the natural language question
- Similarity to training data
## ⚠️ Limitations
- **Schema Knowledge**: The model must be provided with the database schema at inference time
- **SQL Dialect**: Primarily trained on standard SQL; may require adjustments for specific database systems (PostgreSQL, MySQL, etc.)
- **Complex Queries**: Performance may degrade on very complex multi-join queries or advanced SQL features
- **Ambiguity**: May struggle with ambiguous natural language questions
- **Context Length**: Limited to 2048 tokens (including schema + question)
## πŸ”„ Version History
- **v1.0**: Initial release with 5 epochs of training
## πŸ“š Citation
If you use this model in your research or application, please cite:
```bibtex
@misc{chrisjcc_Llama_3.1_8B_Instruct_text_to_sql_adapter,
author = {Christian Contreras Campana},
title = {Llama-3.1-8B-Instruct-text-to-sql-adapter: Fine-tuned Text-to-SQL Model},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/chrisjcc/Llama-3.1-8B-Instruct-text-to-sql-adapter}}
}
```
## πŸ“„ License
This model is released under the **APACHE-2.0** license. The base model [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) has its own license terms.
## πŸ™ Acknowledgments
- Base model: [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
- Training framework: Hugging Face Transformers, PEFT, TRL
- Dataset: chrisjcc/text-to-sql-spider-dataset
## 🀝 Contact
For questions or feedback, please open an issue on the model repository.
---
**Model Type**: LoRA adapter weights
**Training Date**: 2025
**Model Size**: ~8B parameters