File size: 6,030 Bytes
9fcb518
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
---
language:
- en
license: apache-2.0
library_name: peft
tags:
- text-to-sql
- sql-generation
- code-generation
- llama
- fine-tuned
- lora
- text2sql
- natural-language-to-sql
datasets:
- chrisjcc/text-to-sql-spider-dataset
base_model: meta-llama/Llama-3.1-8B-Instruct
pipeline_tag: text-generation
---

# Llama-3.1-8B-Instruct-text-to-sql-adapter

This is a **LoRA Adapter** fine-tuned from [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) for **text-to-SQL** generation tasks.

## πŸ“‹ Model Description

- **Base Model**: [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
- **Model Type**: LoRA Adapter
- **Fine-tuning Method**: QLoRA (4-bit quantization with LoRA adapters)
- **Training Dataset**: chrisjcc/text-to-sql-spider-dataset
- **Task**: Convert natural language questions into SQL queries
- **Language**: English
- **License**: apache-2.0

## 🎯 Intended Use

This model is designed to translate natural language questions into SQL queries for database interaction. It works best when provided with:
1. A database schema (CREATE TABLE statements)
2. A natural language question about the data

## πŸš€ Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from peft import PeftModel

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.1-8B-Instruct",
    device_map="auto",
    torch_dtype="auto"
)

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "chrisjcc/Llama-3.1-8B-Instruct-text-to-sql-adapter")
tokenizer = AutoTokenizer.from_pretrained("chrisjcc/Llama-3.1-8B-Instruct-text-to-sql-adapter")

# For inference, merge adapter for better performance (optional)
model = model.merge_and_unload()

# Create text generation pipeline
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=256,
    do_sample=False,
)

# Example usage
schema = """
CREATE TABLE users (
    id INTEGER PRIMARY KEY,
    name VARCHAR(100),
    email VARCHAR(100),
    created_at TIMESTAMP
);
"""

question = "Show me all users who registered in the last 7 days"

messages = [
    {
        "role": "system",
        "content": f"You are a text to SQL translator.\n\nSCHEMA:\n{schema}"
    },
    {"role": "user", "content": question}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt)
sql_query = outputs[0]['generated_text'][len(prompt):].strip()

print("Generated SQL:", sql_query)
```

## βš™οΈ Training Configuration

### Model Architecture
- **LoRA Rank (r)**: 16
- **LoRA Alpha**: 32
- **LoRA Dropout**: 0.1
- **Target Modules**: All linear layers (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj)
- **Max Sequence Length**: 2048

### Training Hyperparameters
- **Number of Epochs**: 5
- **Per-Device Batch Size**: 1
- **Gradient Accumulation Steps**: 8
- **Effective Batch Size**: 8
- **Learning Rate**: 5e-05
- **Learning Rate Scheduler**: Constant
- **Optimizer**: AdamW (torch fused)
- **Weight Decay**: 0
- **Warmup Ratio**: 0.03
- **Max Gradient Norm**: 1.0
- **Precision**: bfloat16

### Training Infrastructure
- **Hardware**: NVIDIA GPU with bfloat16 support
- **Framework**: Transformers + PEFT + TRL
- **Gradient Checkpointing**: Enabled
- **Flash Attention**: Enabled

## πŸ“Š Training Details

The model was fine-tuned using Supervised Fine-Tuning (SFT) with the following approach:

1. **Dataset Format**: Chat template with system/user/assistant roles
2. **System Prompt**: Includes database schema for context
3. **User Prompt**: Natural language question
4. **Assistant Response**: SQL query

### Example Training Sample

```json
{
  "messages": [
    {
      "role": "system",
      "content": "You are a text to SQL translator...\n\nSCHEMA:\nCREATE TABLE..."
    },
    {
      "role": "user",
      "content": "Show me all customers from New York"
    },
    {
      "role": "assistant",
      "content": "SELECT * FROM customers WHERE city = 'New York';"
    }
  ]
}
```

## πŸŽ“ Model Performance

The model has been trained to generate syntactically correct SQL queries for various database schemas. Performance may vary based on:
- Complexity of the database schema
- Ambiguity in the natural language question
- Similarity to training data

## ⚠️ Limitations

- **Schema Knowledge**: The model must be provided with the database schema at inference time
- **SQL Dialect**: Primarily trained on standard SQL; may require adjustments for specific database systems (PostgreSQL, MySQL, etc.)
- **Complex Queries**: Performance may degrade on very complex multi-join queries or advanced SQL features
- **Ambiguity**: May struggle with ambiguous natural language questions
- **Context Length**: Limited to 2048 tokens (including schema + question)

## πŸ”„ Version History

- **v1.0**: Initial release with 5 epochs of training

## πŸ“š Citation

If you use this model in your research or application, please cite:

```bibtex
@misc{chrisjcc_Llama_3.1_8B_Instruct_text_to_sql_adapter,
  author = {Christian Contreras Campana},
  title = {Llama-3.1-8B-Instruct-text-to-sql-adapter: Fine-tuned Text-to-SQL Model},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/chrisjcc/Llama-3.1-8B-Instruct-text-to-sql-adapter}}
}
```

## πŸ“„ License

This model is released under the **APACHE-2.0** license. The base model [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) has its own license terms.

## πŸ™ Acknowledgments

- Base model: [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
- Training framework: Hugging Face Transformers, PEFT, TRL
- Dataset: chrisjcc/text-to-sql-spider-dataset

## 🀝 Contact

For questions or feedback, please open an issue on the model repository.

---

**Model Type**: LoRA adapter weights
**Training Date**: 2025
**Model Size**: ~8B parameters