File size: 6,112 Bytes
3b47a8d 9588063 3b47a8d 9588063 3b47a8d 9588063 3b47a8d 9588063 3b47a8d 9588063 3b47a8d 9588063 3b47a8d 9588063 3b47a8d 9588063 3b47a8d 9588063 3b47a8d 9588063 3b47a8d 9588063 3b47a8d 9588063 3b47a8d 9588063 3b47a8d 9588063 3b47a8d 9588063 3b47a8d 9588063 3b47a8d 9588063 3b47a8d 9588063 3b47a8d 9588063 3b47a8d 9588063 3b47a8d 9588063 3b47a8d 9588063 3b47a8d 9588063 3b47a8d 9588063 3b47a8d 9588063 3b47a8d 9588063 3b47a8d 9588063 3b47a8d 9588063 3b47a8d 9588063 3b47a8d 9588063 3b47a8d 9588063 3b47a8d 9588063 3b47a8d 9588063 3b47a8d 9588063 3b47a8d | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 | ---
license: mit
base_model: microsoft/Phi-3-mini-4k-instruct
datasets:
- b-mc2/sql-create-context
tags:
- peft
- qlora
- text-to-sql
- phi-3
---
# Enhanced QLoRA Adapter for Phi-3-mini: A Technical SQL Assistant (2 Epochs)
This repository contains an improved, high-performance QLoRA adapter for the `microsoft/Phi-3-mini-4k-instruct` model.
This version has been fine-tuned for **two full epochs** on a Text-to-SQL task, resulting in enhanced performance and reliability compared to single-epoch versions.
The model is designed to function as a technical assistant, capable of generating accurate SQL queries from natural language questions based on a provided database schema.
This project was developed for an engineering and deployment course, with a focus on creating a robust, reproducible, and practical AI artifact.
## Key Improvements in This Version
- **Enhanced Reliability:** Training for two epochs has significantly improved the model's ability to consistently adhere to the required chat template format, reducing parsing errors in production.
- **Maintained Accuracy:** The model maintains its high accuracy in generating syntactically correct and logically sound SQL queries.
- **Robust Loading:** The usage instructions below follow best practices to ensure reliable loading across different environments.
## How to Use
First, ensure you have a compatible environment by installing these specific library versions:
```bash
pip install transformers==4.38.2 peft==0.10.0 accelerate==0.28.0 bitsandbytes==0.43.0 torch
```
The following code provides the most robust method for loading and running inference with this adapter.
```python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
import torch
# --- 1. Configuration ---
base_model_id = "microsoft/Phi-3-mini-4k-instruct"
# IMPORTANT: Replace with your new model's ID on the Hugging Face Hub
adapter_id = "YourUsername/YourNewModelName"
# --- 2. Load the Quantized Base Model ---
# This is required to fit the model in memory-constrained environments like Colab
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
)
base_model = AutoModelForCausalLM.from_pretrained(
base_model_id,
quantization_config=bnb_config,
device_map="auto",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
# --- 3. Load and Apply the LoRA Adapter ---
model = PeftModel.from_pretrained(base_model, adapter_id)
print("Successfully loaded quantized base model and applied adapter.")
# --- 4. Prepare for Inference ---
context = "CREATE TABLE employees (name VARCHAR, department VARCHAR, salary INTEGER)"
question = "What are the names of employees in the 'Engineering' department with a salary over 80000?"
prompt = f"""<|user|>
Given the database schema:
{context}
Generate the SQL query for the following request:
{question}<|end|>
<|assistant|>
"""
# --- 5. Generate the Response ---
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)
outputs = model.generate(input_ids=input_ids, max_new_tokens=100, do_sample=False)
generated_text = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
generated_sql = generated_text.split("<|assistant|>")[-1].strip()
print(f"\nGenerated SQL: {generated_sql}")
# Expected output: SELECT name FROM employees WHERE department = 'Engineering' AND salary > 80000
```
## Training Procedure
### Dataset
The model was fine-tuned on a 10,000-sample subset of the b-mc2/sql-create-context dataset, split 90/10 for training and validation.
### Fine-tuning Configuration (QLoRA)
* **Quantization:** 4-bit NormalFloat (NF4) with `bfloat16` compute dtype.
* **LoRA Rank (`r`):** 8
* **LoRA Alpha (`lora_alpha`):** 16
* **Target Modules:** All linear layers in the Phi-3 architecture (`q_proj`, `k_proj`, `v_proj`, `o_proj`, etc.).
### Training Hyperparameters
* **Learning Rate:** 2e-4
* **Epochs: 2**
* **Effective Batch Size:** 8
* **Optimizer:** Paged AdamW (32-bit)
* **LR Scheduler:** Cosine
## Evaluation and Results
Qualitative evaluation on a held-out test set confirms that the model consistently generates correct SQL queries.
The extended training to two epochs has successfully addressed the primary limitation of the single-epoch version: inconsistent formatting.
This model now reliably generates the `<|assistant|>` token, making it more suitable for automated parsing and deployment.
## Deployment & Optimization Considerations
For deployment in a production environment, consider the following optimizations:
1. **Merge Adapter Weights:** Before deploying, merge the adapter weights into the base model to create a single, solid model.
This eliminates the overhead of dynamically applying the adapter during inference and can improve performance.
```python
# After loading the model and adapter:
merged_model = model.merge_and_unload()
# Use 'merged_model' for all subsequent 'generate' calls.
```
2. **Further Quantization:** For CPU-based deployment or even more efficient GPU usage, the merged model can be further quantized into formats like **GGUF** (for use with `llama.cpp`) or **AWQ/GPTQ**.
3. **API Serving:** Wrap the model in a high-performance web server like FastAPI or use a dedicated LLM serving framework like vLLM for optimal throughput and batching.
## Limitations and Responsible AI
* **Generalization:** The model is specialized for the Text-to-SQL task and the schema styles seen in its training data. It may not perform well on highly complex or esoteric SQL dialects.
* **Security:** This model is a proof-of-concept and has not been hardened against SQL injection attacks. All generated SQL should be treated as untrusted input and must be sanitized or executed in a sandboxed, read-only environment.
* **Bias:** The training data is the source of the model's knowledge. Any biases present in the sql-create-context dataset may be reflected in the model's outputs.
|