|
|
--- |
|
|
language: |
|
|
- en |
|
|
license: apache-2.0 |
|
|
library_name: peft |
|
|
tags: |
|
|
- text-to-sql |
|
|
- sql-generation |
|
|
- code-generation |
|
|
- llama |
|
|
- fine-tuned |
|
|
- lora |
|
|
- text2sql |
|
|
- natural-language-to-sql |
|
|
datasets: |
|
|
- chrisjcc/text-to-sql-spider-dataset |
|
|
base_model: meta-llama/Llama-3.1-8B-Instruct |
|
|
pipeline_tag: text-generation |
|
|
--- |
|
|
|
|
|
# Llama-3.1-8B-Instruct-text-to-sql-adapter |
|
|
|
|
|
This is a **LoRA Adapter** fine-tuned from [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) for **text-to-SQL** generation tasks. |
|
|
|
|
|
## π Model Description |
|
|
|
|
|
- **Base Model**: [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) |
|
|
- **Model Type**: LoRA Adapter |
|
|
- **Fine-tuning Method**: QLoRA (4-bit quantization with LoRA adapters) |
|
|
- **Training Dataset**: chrisjcc/text-to-sql-spider-dataset |
|
|
- **Task**: Convert natural language questions into SQL queries |
|
|
- **Language**: English |
|
|
- **License**: apache-2.0 |
|
|
|
|
|
## π― Intended Use |
|
|
|
|
|
This model is designed to translate natural language questions into SQL queries for database interaction. It works best when provided with: |
|
|
1. A database schema (CREATE TABLE statements) |
|
|
2. A natural language question about the data |
|
|
|
|
|
## π Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline |
|
|
from peft import PeftModel |
|
|
|
|
|
# Load base model |
|
|
base_model = AutoModelForCausalLM.from_pretrained( |
|
|
"meta-llama/Llama-3.1-8B-Instruct", |
|
|
device_map="auto", |
|
|
torch_dtype="auto" |
|
|
) |
|
|
|
|
|
# Load LoRA adapter |
|
|
model = PeftModel.from_pretrained(base_model, "chrisjcc/Llama-3.1-8B-Instruct-text-to-sql-adapter") |
|
|
tokenizer = AutoTokenizer.from_pretrained("chrisjcc/Llama-3.1-8B-Instruct-text-to-sql-adapter") |
|
|
|
|
|
# For inference, merge adapter for better performance (optional) |
|
|
model = model.merge_and_unload() |
|
|
|
|
|
# Create text generation pipeline |
|
|
pipe = pipeline( |
|
|
"text-generation", |
|
|
model=model, |
|
|
tokenizer=tokenizer, |
|
|
max_new_tokens=256, |
|
|
do_sample=False, |
|
|
) |
|
|
|
|
|
# Example usage |
|
|
schema = """ |
|
|
CREATE TABLE users ( |
|
|
id INTEGER PRIMARY KEY, |
|
|
name VARCHAR(100), |
|
|
email VARCHAR(100), |
|
|
created_at TIMESTAMP |
|
|
); |
|
|
""" |
|
|
|
|
|
question = "Show me all users who registered in the last 7 days" |
|
|
|
|
|
messages = [ |
|
|
{ |
|
|
"role": "system", |
|
|
"content": f"You are a text to SQL translator.\n\nSCHEMA:\n{schema}" |
|
|
}, |
|
|
{"role": "user", "content": question} |
|
|
] |
|
|
|
|
|
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
|
|
outputs = pipe(prompt) |
|
|
sql_query = outputs[0]['generated_text'][len(prompt):].strip() |
|
|
|
|
|
print("Generated SQL:", sql_query) |
|
|
``` |
|
|
|
|
|
## βοΈ Training Configuration |
|
|
|
|
|
### Model Architecture |
|
|
- **LoRA Rank (r)**: 16 |
|
|
- **LoRA Alpha**: 32 |
|
|
- **LoRA Dropout**: 0.1 |
|
|
- **Target Modules**: All linear layers (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj) |
|
|
- **Max Sequence Length**: 2048 |
|
|
|
|
|
### Training Hyperparameters |
|
|
- **Number of Epochs**: 5 |
|
|
- **Per-Device Batch Size**: 1 |
|
|
- **Gradient Accumulation Steps**: 8 |
|
|
- **Effective Batch Size**: 8 |
|
|
- **Learning Rate**: 5e-05 |
|
|
- **Learning Rate Scheduler**: Constant |
|
|
- **Optimizer**: AdamW (torch fused) |
|
|
- **Weight Decay**: 0 |
|
|
- **Warmup Ratio**: 0.03 |
|
|
- **Max Gradient Norm**: 1.0 |
|
|
- **Precision**: bfloat16 |
|
|
|
|
|
### Training Infrastructure |
|
|
- **Hardware**: NVIDIA GPU with bfloat16 support |
|
|
- **Framework**: Transformers + PEFT + TRL |
|
|
- **Gradient Checkpointing**: Enabled |
|
|
- **Flash Attention**: Enabled |
|
|
|
|
|
## π Training Details |
|
|
|
|
|
The model was fine-tuned using Supervised Fine-Tuning (SFT) with the following approach: |
|
|
|
|
|
1. **Dataset Format**: Chat template with system/user/assistant roles |
|
|
2. **System Prompt**: Includes database schema for context |
|
|
3. **User Prompt**: Natural language question |
|
|
4. **Assistant Response**: SQL query |
|
|
|
|
|
### Example Training Sample |
|
|
|
|
|
```json |
|
|
{ |
|
|
"messages": [ |
|
|
{ |
|
|
"role": "system", |
|
|
"content": "You are a text to SQL translator...\n\nSCHEMA:\nCREATE TABLE..." |
|
|
}, |
|
|
{ |
|
|
"role": "user", |
|
|
"content": "Show me all customers from New York" |
|
|
}, |
|
|
{ |
|
|
"role": "assistant", |
|
|
"content": "SELECT * FROM customers WHERE city = 'New York';" |
|
|
} |
|
|
] |
|
|
} |
|
|
``` |
|
|
|
|
|
## π Model Performance |
|
|
|
|
|
The model has been trained to generate syntactically correct SQL queries for various database schemas. Performance may vary based on: |
|
|
- Complexity of the database schema |
|
|
- Ambiguity in the natural language question |
|
|
- Similarity to training data |
|
|
|
|
|
## β οΈ Limitations |
|
|
|
|
|
- **Schema Knowledge**: The model must be provided with the database schema at inference time |
|
|
- **SQL Dialect**: Primarily trained on standard SQL; may require adjustments for specific database systems (PostgreSQL, MySQL, etc.) |
|
|
- **Complex Queries**: Performance may degrade on very complex multi-join queries or advanced SQL features |
|
|
- **Ambiguity**: May struggle with ambiguous natural language questions |
|
|
- **Context Length**: Limited to 2048 tokens (including schema + question) |
|
|
|
|
|
## π Version History |
|
|
|
|
|
- **v1.0**: Initial release with 5 epochs of training |
|
|
|
|
|
## π Citation |
|
|
|
|
|
If you use this model in your research or application, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{chrisjcc_Llama_3.1_8B_Instruct_text_to_sql_adapter, |
|
|
author = {Christian Contreras Campana}, |
|
|
title = {Llama-3.1-8B-Instruct-text-to-sql-adapter: Fine-tuned Text-to-SQL Model}, |
|
|
year = {2025}, |
|
|
publisher = {Hugging Face}, |
|
|
howpublished = {\url{https://huggingface.co/chrisjcc/Llama-3.1-8B-Instruct-text-to-sql-adapter}} |
|
|
} |
|
|
``` |
|
|
|
|
|
## π License |
|
|
|
|
|
This model is released under the **APACHE-2.0** license. The base model [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) has its own license terms. |
|
|
|
|
|
## π Acknowledgments |
|
|
|
|
|
- Base model: [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) |
|
|
- Training framework: Hugging Face Transformers, PEFT, TRL |
|
|
- Dataset: chrisjcc/text-to-sql-spider-dataset |
|
|
|
|
|
## π€ Contact |
|
|
|
|
|
For questions or feedback, please open an issue on the model repository. |
|
|
|
|
|
--- |
|
|
|
|
|
**Model Type**: LoRA adapter weights |
|
|
**Training Date**: 2025 |
|
|
**Model Size**: ~8B parameters |
|
|
|