File size: 6,112 Bytes
3b47a8d
9588063
 
 
 
 
 
 
 
 
3b47a8d
 
9588063
3b47a8d
9588063
 
3b47a8d
9588063
 
3b47a8d
9588063
3b47a8d
9588063
 
 
3b47a8d
9588063
3b47a8d
9588063
 
 
 
3b47a8d
9588063
3b47a8d
9588063
 
 
 
3b47a8d
9588063
 
 
 
3b47a8d
9588063
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3b47a8d
9588063
 
 
3b47a8d
9588063
 
 
3b47a8d
9588063
 
 
3b47a8d
9588063
 
 
 
3b47a8d
9588063
 
 
 
 
3b47a8d
9588063
 
3b47a8d
9588063
3b47a8d
9588063
 
3b47a8d
9588063
3b47a8d
9588063
3b47a8d
9588063
 
 
 
3b47a8d
9588063
3b47a8d
9588063
 
 
 
 
3b47a8d
 
9588063
3b47a8d
9588063
 
 
3b47a8d
9588063
3b47a8d
9588063
3b47a8d
9588063
 
3b47a8d
9588063
 
 
 
 
3b47a8d
9588063
 
3b47a8d
 
9588063
3b47a8d
9588063
 
 
3b47a8d
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
---
license: mit
base_model: microsoft/Phi-3-mini-4k-instruct
datasets:
- b-mc2/sql-create-context
tags:
- peft
- qlora
- text-to-sql
- phi-3
---

# Enhanced QLoRA Adapter for Phi-3-mini: A Technical SQL Assistant (2 Epochs)

This repository contains an improved, high-performance QLoRA adapter for the `microsoft/Phi-3-mini-4k-instruct` model.
This version has been fine-tuned for **two full epochs** on a Text-to-SQL task, resulting in enhanced performance and reliability compared to single-epoch versions.

The model is designed to function as a technical assistant, capable of generating accurate SQL queries from natural language questions based on a provided database schema.
This project was developed for an engineering and deployment course, with a focus on creating a robust, reproducible, and practical AI artifact.

## Key Improvements in This Version

-   **Enhanced Reliability:** Training for two epochs has significantly improved the model's ability to consistently adhere to the required chat template format, reducing parsing errors in production.
-   **Maintained Accuracy:** The model maintains its high accuracy in generating syntactically correct and logically sound SQL queries.
-   **Robust Loading:** The usage instructions below follow best practices to ensure reliable loading across different environments.

## How to Use

First, ensure you have a compatible environment by installing these specific library versions:
```bash
pip install transformers==4.38.2 peft==0.10.0 accelerate==0.28.0 bitsandbytes==0.43.0 torch
```

The following code provides the most robust method for loading and running inference with this adapter.

```python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
import torch

# --- 1. Configuration ---
base_model_id = "microsoft/Phi-3-mini-4k-instruct"
# IMPORTANT: Replace with your new model's ID on the Hugging Face Hub
adapter_id = "YourUsername/YourNewModelName" 

# --- 2. Load the Quantized Base Model ---
# This is required to fit the model in memory-constrained environments like Colab
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token

# --- 3. Load and Apply the LoRA Adapter ---
model = PeftModel.from_pretrained(base_model, adapter_id)
print("Successfully loaded quantized base model and applied adapter.")

# --- 4. Prepare for Inference ---
context = "CREATE TABLE employees (name VARCHAR, department VARCHAR, salary INTEGER)"
question = "What are the names of employees in the 'Engineering' department with a salary over 80000?"

prompt = f"""<|user|>
Given the database schema:
{context}

Generate the SQL query for the following request:
{question}<|end|>
<|assistant|>
"""

# --- 5. Generate the Response ---
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)
outputs = model.generate(input_ids=input_ids, max_new_tokens=100, do_sample=False)
generated_text = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
generated_sql = generated_text.split("<|assistant|>")[-1].strip()

print(f"\nGenerated SQL: {generated_sql}")
# Expected output: SELECT name FROM employees WHERE department = 'Engineering' AND salary > 80000

```

## Training Procedure
### Dataset

The model was fine-tuned on a 10,000-sample subset of the b-mc2/sql-create-context dataset, split 90/10 for training and validation.

### Fine-tuning Configuration (QLoRA)

* **Quantization:** 4-bit NormalFloat (NF4) with `bfloat16` compute dtype.
* **LoRA Rank (`r`):** 8
* **LoRA Alpha (`lora_alpha`):** 16
* **Target Modules:** All linear layers in the Phi-3 architecture (`q_proj`, `k_proj`, `v_proj`, `o_proj`, etc.).

### Training Hyperparameters

* **Learning Rate:** 2e-4
* **Epochs: 2**
* **Effective Batch Size:** 8
* **Optimizer:** Paged AdamW (32-bit)
* **LR Scheduler:** Cosine


## Evaluation and Results

Qualitative evaluation on a held-out test set confirms that the model consistently generates correct SQL queries.
The extended training to two epochs has successfully addressed the primary limitation of the single-epoch version: inconsistent formatting.
This model now reliably generates the `<|assistant|>` token, making it more suitable for automated parsing and deployment.

## Deployment & Optimization Considerations

For deployment in a production environment, consider the following optimizations:

1. **Merge Adapter Weights:** Before deploying, merge the adapter weights into the base model to create a single, solid model.
This eliminates the overhead of dynamically applying the adapter during inference and can improve performance.

```python
# After loading the model and adapter:
merged_model = model.merge_and_unload()
# Use 'merged_model' for all subsequent 'generate' calls.
```

2. **Further Quantization:** For CPU-based deployment or even more efficient GPU usage, the merged model can be further quantized into formats like **GGUF** (for use with `llama.cpp`) or **AWQ/GPTQ**.
3. **API Serving:** Wrap the model in a high-performance web server like FastAPI or use a dedicated LLM serving framework like vLLM for optimal throughput and batching.


## Limitations and Responsible AI

* **Generalization:** The model is specialized for the Text-to-SQL task and the schema styles seen in its training data. It may not perform well on highly complex or esoteric SQL dialects.
* **Security:** This model is a proof-of-concept and has not been hardened against SQL injection attacks. All generated SQL should be treated as untrusted input and must be sanitized or executed in a sandboxed, read-only environment.
* **Bias:** The training data is the source of the model's knowledge. Any biases present in the sql-create-context dataset may be reflected in the model's outputs.