File size: 4,359 Bytes
f9e8616
f287a27
 
 
 
 
 
 
 
 
 
 
 
f9e8616
f287a27
 
 
 
 
f9e8616
 
f287a27
f9e8616
f287a27
 
f9e8616
f287a27
f9e8616
f287a27
 
 
 
 
f9e8616
f287a27
f9e8616
f287a27
f9e8616
f287a27
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f9e8616
 
f287a27
 
 
 
 
 
 
 
f9e8616
f287a27
f9e8616
f287a27
 
 
 
f9e8616
f287a27
f9e8616
f287a27
 
 
 
f9e8616
f287a27
f9e8616
f287a27
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
---
language:
- en
license: mit
tags:
- text-to-sql
- sql
- nlp
- fine-tuning
- qlora
- lora
- phi-3
- peft
base_model: microsoft/Phi-3-mini-4k-instruct
datasets:
- b-mc2/sql-create-context
metrics:
- bleu
pipeline_tag: text-generation
---

# QueryCraft — Phi-3 Mini Fine-Tuned for Text-to-SQL

Fine-tuned **Phi-3 Mini 3.8B** using **QLoRA** on 76,000 Text-to-SQL examples.
Converts natural language questions into valid SQL queries.

## Evaluation Results

| Metric | Base Model | Fine-Tuned |
|---|---|---|
| Exact Match | 0.0% | 82.0% |
| Execution Accuracy | 84.0% | 96.0% |
| BLEU Score | 55.79 | 96.42 |

Evaluated on 50 held-out validation examples not seen during training.

## Model Details

| Property | Value |
|---|---|
| Base model | microsoft/Phi-3-mini-4k-instruct (3.8B params) |
| Fine-tuning method | QLoRA (4-bit NF4 + LoRA) |
| LoRA rank | r=16, alpha=32 |
| Trainable parameters | 8,912,896 (0.23%) |
| Training examples | 76,577 |
| Training hardware | NVIDIA RTX 5060 Ti 8GB |
| Training time | 3 hours 2 minutes |
| Final train loss | 0.5677 |
| Max sequence length | 256 tokens |

## How to Use

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel

base_model = "microsoft/Phi-3-mini-4k-instruct"
adapter    = "Sid9797/querycraft-phi3-sql"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    quantization_config=bnb_config,
    device_map="cuda:0",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
)
model = PeftModel.from_pretrained(model, adapter)
model.eval()

prompt = '''### System:
You are a SQL expert. Given a database schema and a natural language question, generate a valid SQL query that answers the question. Output only the SQL query with no explanation.

### Schema:
CREATE TABLE employees (id INTEGER, name VARCHAR, department VARCHAR, salary FLOAT)

### Question:
What is the average salary by department?

### SQL:
'''

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=128, do_sample=False)

sql = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(sql.strip().split("\n")[0])
# SELECT AVG(salary) FROM employees GROUP BY department
```

## Prompt Format

The model was trained on the Alpaca instruction format:
System:
You are a SQL expert. Given a database schema and a natural language question,
generate a valid SQL query that answers the question.
Output only the SQL query with no explanation.
Schema:
{CREATE TABLE statements}
Question:
{natural language question}
SQL:
{model generates SQL here}
## Training Details

- **Dataset:** [b-mc2/sql-create-context](https://huggingface.co/datasets/b-mc2/sql-create-context)
  — 78,577 examples with inline CREATE TABLE schemas
- **Train/Val split:** 76,577 train / 2,000 validation (seeded shuffle)
- **Quantization:** 4-bit NF4 with double quantization (bitsandbytes)
- **LoRA target modules:** q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- **Optimizer:** AdamW with cosine LR schedule, warmup_ratio=0.05
- **Effective batch size:** 16 (batch_size=4, gradient_accumulation=4)
- **Packing:** Enabled — short examples concatenated to fill 256-token sequences

## Why the Base Model Scored 0% Exact Match

The base Phi-3 Mini, without fine-tuning, consistently wrapped SQL output
in markdown code fences (`` ```sql ... ``` ``) and appended semicolons.
This formatting breaks exact match evaluation even when the SQL logic is correct.
Fine-tuning on consistently formatted examples eliminated this entirely.

## Limitations

- Optimised for single-table and simple multi-table queries
- Schema must be provided as CREATE TABLE SQL statements
- Best results on English-language questions
- May struggle with highly complex nested subqueries

## Links

- **GitHub:** https://github.com/Siddhesh-Ai9797/querycraft
- **Base Model:** https://huggingface.co/microsoft/Phi-3-mini-4k-instruct
- **Training Dataset:** https://huggingface.co/datasets/b-mc2/sql-create-context