File size: 2,904 Bytes
852a840
 
874f657
852a840
a0ea55c
 
874f657
a0ea55c
 
852a840
874f657
852a840
 
 
 
 
a0ea55c
852a840
874f657
 
 
852a840
874f657
852a840
a0ea55c
 
 
874f657
a0ea55c
 
 
874f657
a0ea55c
874f657
a0ea55c
874f657
a0ea55c
874f657
 
 
 
 
a0ea55c
874f657
a0ea55c
 
 
874f657
 
 
a0ea55c
 
874f657
a0ea55c
 
874f657
a0ea55c
 
 
874f657
a0ea55c
874f657
a0ea55c
874f657
a0ea55c
 
 
 
874f657
a0ea55c
874f657
a0ea55c
874f657
a0ea55c
 
 
874f657
a0ea55c
 
 
874f657
a0ea55c
874f657
 
 
a0ea55c
874f657
a0ea55c
874f657
 
 
 
 
 
a0ea55c
874f657
 
 
 
 
a0ea55c
874f657
a0ea55c
874f657
 
 
 
 
a0ea55c
874f657
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
---
base_model: unsloth/meta-llama-3.1-8b-bnb-4bit
pipeline_tag: text-generation
tags:
- text-generation
- sql-generation
- llama
- lora
- peft
- unsloth
- transformers
license: apache-2.0
language:
- en
---

# SQL-Genie (LLaMA-3.1-8B Fine-Tuned)

## 🧠 Model Overview

**SQL-Genie** is a fine-tuned version of **LLaMA-3.1-8B**, specialized for converting **natural language questions into SQL queries**.

The model was trained using **parameter-efficient fine-tuning (LoRA)** on a structured SQL instruction dataset, enabling strong SQL generation performance while remaining lightweight and affordable to train on limited compute (Google Colab).

- **Developed by:** dhashu  
- **Base model:** `unsloth/meta-llama-3.1-8b-bnb-4bit`  
- **License:** Apache-2.0  
- **Training stack:** Unsloth + Hugging Face TRL  

---

## ⚙️ Training Methodology

This model was trained using **LoRA (Low-Rank Adaptation)** via the **PEFT** framework.

### Key Details
- Base model loaded in **4-bit quantization** for memory efficiency
- **Base weights frozen**
- **LoRA adapters** applied to:
  - Attention layers (`q_proj`, `k_proj`, `v_proj`, `o_proj`)
  - Feed-forward layers (`gate_proj`, `up_proj`, `down_proj`)
- Fine-tuned using **Supervised Fine-Tuning (SFT)**

This approach allows efficient specialization without full model retraining.

---

## 📊 Dataset

The model was trained on a subset of the **`b-mc2/sql-create-context`** dataset, which includes:

- Natural language questions
- Database schema / context
- Corresponding SQL queries

Each sample was formatted as an **instruction-style prompt** to improve reasoning and structured output.

---

## 🚀 Performance & Efficiency

- 🚀 **2× faster fine-tuning** using Unsloth
- 💾 **Low VRAM usage** via 4-bit quantization
- 🧠 Improved SQL syntax and schema understanding
- ⚡ Suitable for real-time inference and lightweight deployments

---

## 🧩 Model Variants

This repository contains a **merged model**:

### 🔹 Merged 4-bit Model
- LoRA adapters merged into base weights
- No PEFT required at inference time
- Ready-to-use single checkpoint
- Optimized for easy deployment

---

## ▶️ How to Use (Inference)

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "dhashu/sql-genie-full"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    load_in_4bit=True,
)

prompt = """Below is an input question, context is given to help. Generate a SQL response.
### Input: List all employees hired after 2020
### Context: CREATE TABLE employees(id, name, hire_date)
### SQL Response:
"""

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

outputs = model.generate(
    **inputs,
    max_new_tokens=128,
    temperature=0.7,
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))