|
|
--- |
|
|
base_model: unsloth/meta-llama-3.1-8b-bnb-4bit |
|
|
pipeline_tag: text-generation |
|
|
tags: |
|
|
- text-generation |
|
|
- sql-generation |
|
|
- llama |
|
|
- lora |
|
|
- peft |
|
|
- unsloth |
|
|
- transformers |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
--- |
|
|
|
|
|
# SQL-Genie (LLaMA-3.1-8B Fine-Tuned) |
|
|
|
|
|
## ๐ง Model Overview |
|
|
|
|
|
**SQL-Genie** is a fine-tuned version of **LLaMA-3.1-8B**, specialized for converting **natural language questions into SQL queries**. |
|
|
|
|
|
The model was trained using **parameter-efficient fine-tuning (LoRA)** on a structured SQL instruction dataset, enabling strong SQL generation performance while remaining lightweight and affordable to train on limited compute (Google Colab). |
|
|
|
|
|
- **Developed by:** dhashu |
|
|
- **Base model:** `unsloth/meta-llama-3.1-8b-bnb-4bit` |
|
|
- **License:** Apache-2.0 |
|
|
- **Training stack:** Unsloth + Hugging Face TRL |
|
|
|
|
|
--- |
|
|
|
|
|
## โ๏ธ Training Methodology |
|
|
|
|
|
This model was trained using **LoRA (Low-Rank Adaptation)** via the **PEFT** framework. |
|
|
|
|
|
### Key Details |
|
|
- Base model loaded in **4-bit quantization** for memory efficiency |
|
|
- **Base weights frozen** |
|
|
- **LoRA adapters** applied to: |
|
|
- Attention layers (`q_proj`, `k_proj`, `v_proj`, `o_proj`) |
|
|
- Feed-forward layers (`gate_proj`, `up_proj`, `down_proj`) |
|
|
- Fine-tuned using **Supervised Fine-Tuning (SFT)** |
|
|
|
|
|
This approach allows efficient specialization without full model retraining. |
|
|
|
|
|
--- |
|
|
|
|
|
## ๐ Dataset |
|
|
|
|
|
The model was trained on a subset of the **`b-mc2/sql-create-context`** dataset, which includes: |
|
|
|
|
|
- Natural language questions |
|
|
- Database schema / context |
|
|
- Corresponding SQL queries |
|
|
|
|
|
Each sample was formatted as an **instruction-style prompt** to improve reasoning and structured output. |
|
|
|
|
|
--- |
|
|
|
|
|
## ๐ Performance & Efficiency |
|
|
|
|
|
- ๐ **2ร faster fine-tuning** using Unsloth |
|
|
- ๐พ **Low VRAM usage** via 4-bit quantization |
|
|
- ๐ง Improved SQL syntax and schema understanding |
|
|
- โก Suitable for real-time inference and lightweight deployments |
|
|
|
|
|
--- |
|
|
|
|
|
## ๐งฉ Model Variants |
|
|
|
|
|
This repository contains a **merged model**: |
|
|
|
|
|
### ๐น Merged 4-bit Model |
|
|
- LoRA adapters merged into base weights |
|
|
- No PEFT required at inference time |
|
|
- Ready-to-use single checkpoint |
|
|
- Optimized for easy deployment |
|
|
|
|
|
--- |
|
|
|
|
|
## โถ๏ธ How to Use (Inference) |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
import torch |
|
|
|
|
|
model_id = "dhashu/sql-genie-full" |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
model_id, |
|
|
device_map="auto", |
|
|
load_in_4bit=True, |
|
|
) |
|
|
|
|
|
prompt = """Below is an input question, context is given to help. Generate a SQL response. |
|
|
### Input: List all employees hired after 2020 |
|
|
### Context: CREATE TABLE employees(id, name, hire_date) |
|
|
### SQL Response: |
|
|
""" |
|
|
|
|
|
inputs = tokenizer(prompt, return_tensors="pt").to("cuda") |
|
|
|
|
|
outputs = model.generate( |
|
|
**inputs, |
|
|
max_new_tokens=128, |
|
|
temperature=0.7, |
|
|
) |
|
|
|
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
|