File size: 5,316 Bytes
2c6eca5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e38533d
2c6eca5
e38533d
2c6eca5
e38533d
0404340
e38533d
0404340
e38533d
2c6eca5
 
 
e38533d
2c6eca5
e38533d
0404340
e38533d
0404340
e38533d
0404340
e38533d
2c6eca5
 
 
 
 
 
e38533d
2c6eca5
 
 
e38533d
0404340
e38533d
2c6eca5
e38533d
2c6eca5
e38533d
2c6eca5
e38533d
2c6eca5
e38533d
2c6eca5
 
 
 
 
 
 
e38533d
2c6eca5
e38533d
2c6eca5
e38533d
2c6eca5
 
 
 
 
 
 
 
 
 
 
e38533d
0404340
e38533d
2c6eca5
 
 
e38533d
2c6eca5
e38533d
2c6eca5
 
 
e38533d
2c6eca5
e38533d
2c6eca5
 
 
 
 
 
e38533d
2c6eca5
e38533d
2c6eca5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e38533d
0404340
e38533d
0404340
e38533d
2c6eca5
e38533d
2c6eca5
 
 
 
 
 
 
e38533d
2c6eca5
e38533d
2c6eca5
 
 
e38533d
2c6eca5
e38533d
2c6eca5
 
 
 
e38533d
2c6eca5
e38533d
2c6eca5
0404340
 
2c6eca5
e38533d
0404340
e38533d
0404340
e38533d
2c6eca5
 
 
 
e38533d
0404340
e38533d
2c6eca5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e38533d
2c6eca5
 
 
 
 
 
 
e38533d
0404340
 
 
e38533d
2c6eca5
 
 
 
 
 
 
e38533d
2c6eca5
 
 
 
e38533d
0404340
e38533d
0404340
2c6eca5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e38533d
2c6eca5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
---
license: mit
base_model: microsoft/phi-3-mini-4k-instruct
tags:
- llm
- code-generation
- bug-fixing
- lora
- peft
- python
datasets:
- mbpp
metrics:
- exact_match
- similarity
---

# DebugGPT LoRA Adapter for Phi-3 Mini

A lightweight LoRA adapter fine-tuned on synthetic Python bug-fixing tasks using the MBPP dataset. This model enhances the ability of Phi-3 Mini to detect and correct common Python syntax errors while preserving general language capabilities.

---

## Model Description

- **Base Model:** microsoft/phi-3-mini-4k-instruct
- **Fine-Tuning Method:** QLoRA (Low-Rank Adaptation with 4-bit quantization)
- **Task:** Automated Python bug fixing

The model takes buggy Python code as input and generates the corrected version.

---

## Intended Use

This model is designed for:

- Python debugging assistance
- Educational coding tools
- AI-assisted code correction
- Research experiments in code repair

### Out-of-Scope Use

- Production-critical systems
- Security-sensitive applications
- Complex multi-file debugging

---

## Dataset

We use the **MBPP (Mostly Basic Python Problems)** dataset. Since MBPP contains correct code, we generate a bug-fixing dataset by injecting synthetic bugs.

### Data Format

Each example follows an instruction-tuning format:

```json
{
  "instruction": "Fix the bug in the following Python code",
  "input": "<buggy code>",
  "output": "<correct code>"
}
```

### Bug Injection Strategy

We introduce controlled bugs such as:

- Operator replacement (`+``-`)
- Comparison changes (`>``<`)
- Removal of return statements

### Dataset Size

| Split      | Samples |
|------------|---------|
| Train      | ~374    |
| Validation | ~90     |
| Test       | ~500    |

---

## Training Procedure

### Method: QLoRA

To enable efficient training on limited hardware:

- Base model loaded in 4-bit precision (NF4)
- Base weights frozen
- Only LoRA adapters trained

### LoRA Configuration

| Parameter       | Value                              |
|-----------------|------------------------------------|
| Rank (r)        | 16                                 |
| Alpha           | 32                                 |
| Dropout         | 0.05                               |
| Target Modules  | q_proj, k_proj, v_proj, o_proj     |

### Training Configuration

| Parameter              | Value   |
|------------------------|---------|
| Epochs                 | 3       |
| Learning Rate          | 2e-4    |
| Batch Size             | 1       |
| Gradient Accumulation  | 8       |
| Precision              | FP16    |
| Optimizer              | AdamW   |

---

## Hardware & Frameworks

- **GPU:** NVIDIA Tesla T4
- **Frameworks:** Hugging Face Transformers, PEFT (LoRA), TRL (SFTTrainer), Weights & Biases

---

## Evaluation Results

### Performance Summary

| Metric                  | Base Model    | Fine-Tuned Model   |
|-------------------------|---------------|--------------------|
| Syntax Fix Accuracy     | Low           | Noticeably Higher  |
| Indentation Correction  | Inconsistent  | Reliable           |
| Variable Error Fixing   | Occasional    | Improved           |
| Complex Logic Bugs      | Limited       | Limited (unchanged)|
| Instruction Adherence   | Moderate      | High               |

> **Note:** Quantitative metrics (e.g., exact match accuracy, CodeBLEU) were not computed due to dataset and tooling constraints.

---

## Example

### Input — Buggy Code

```python
for i in range(5)
    print(i)
```

### Output — Fixed Code

```python
for i in range(5):
    print(i)
```

---

## Limitations

- Small dataset size limits generalization
- Focused primarily on syntax-level bugs
- Limited performance on complex logical errors
- Not evaluated on large-scale real-world codebases

---

## Discussion

### What Worked Well

- QLoRA enabled efficient fine-tuning on limited hardware
- Significant improvement in syntax correction tasks
- Strong adherence to instruction format

### Challenges

- Limited dataset size
- Lack of quantitative evaluation metrics
- Difficulty handling complex multi-line logic bugs

### Ethical Considerations

- The model may generate incorrect fixes for complex bugs
- Should be used as an assistive tool, not a final authority
- Users should validate outputs before deployment

---

## How to Use

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = AutoModelForCausalLM.from_pretrained(
    "microsoft/phi-3-mini-4k-instruct"
)

tokenizer = AutoTokenizer.from_pretrained(
    "microsoft/phi-3-mini-4k-instruct"
)

model = PeftModel.from_pretrained(
    base_model,
    "Sud1212/phi3-debug-llm-lora"
)

prompt = "Fix the bug:\nfor i in range(5)\n    print(i)"

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

---

## Resources

- **GitHub Repository:** [Phi3-debugLLM-LoRA](https://github.com/suddhumaddi/Phi3-debugLLM-LoRA)
- **Weights & Biases Dashboard:** [W&B Project](https://wandb.ai/suddhumaddi-woxsen-university/huggingface)
- **Dataset (MBPP):** [Hugging Face Datasets](https://huggingface.co/datasets/mbpp)

---

## Author

**Sudarshan Maddi**
Woxsen University

---

## License

MIT License