phi3-debug-llm-lora / README.md
Sud1212's picture
Updated README.
2c6eca5 verified
---
license: mit
base_model: microsoft/phi-3-mini-4k-instruct
tags:
- llm
- code-generation
- bug-fixing
- lora
- peft
- python
datasets:
- mbpp
metrics:
- exact_match
- similarity
---
# DebugGPT LoRA Adapter for Phi-3 Mini
A lightweight LoRA adapter fine-tuned on synthetic Python bug-fixing tasks using the MBPP dataset. This model enhances the ability of Phi-3 Mini to detect and correct common Python syntax errors while preserving general language capabilities.
---
## Model Description
- **Base Model:** microsoft/phi-3-mini-4k-instruct
- **Fine-Tuning Method:** QLoRA (Low-Rank Adaptation with 4-bit quantization)
- **Task:** Automated Python bug fixing
The model takes buggy Python code as input and generates the corrected version.
---
## Intended Use
This model is designed for:
- Python debugging assistance
- Educational coding tools
- AI-assisted code correction
- Research experiments in code repair
### Out-of-Scope Use
- Production-critical systems
- Security-sensitive applications
- Complex multi-file debugging
---
## Dataset
We use the **MBPP (Mostly Basic Python Problems)** dataset. Since MBPP contains correct code, we generate a bug-fixing dataset by injecting synthetic bugs.
### Data Format
Each example follows an instruction-tuning format:
```json
{
"instruction": "Fix the bug in the following Python code",
"input": "<buggy code>",
"output": "<correct code>"
}
```
### Bug Injection Strategy
We introduce controlled bugs such as:
- Operator replacement (`+``-`)
- Comparison changes (`>``<`)
- Removal of return statements
### Dataset Size
| Split | Samples |
|------------|---------|
| Train | ~374 |
| Validation | ~90 |
| Test | ~500 |
---
## Training Procedure
### Method: QLoRA
To enable efficient training on limited hardware:
- Base model loaded in 4-bit precision (NF4)
- Base weights frozen
- Only LoRA adapters trained
### LoRA Configuration
| Parameter | Value |
|-----------------|------------------------------------|
| Rank (r) | 16 |
| Alpha | 32 |
| Dropout | 0.05 |
| Target Modules | q_proj, k_proj, v_proj, o_proj |
### Training Configuration
| Parameter | Value |
|------------------------|---------|
| Epochs | 3 |
| Learning Rate | 2e-4 |
| Batch Size | 1 |
| Gradient Accumulation | 8 |
| Precision | FP16 |
| Optimizer | AdamW |
---
## Hardware & Frameworks
- **GPU:** NVIDIA Tesla T4
- **Frameworks:** Hugging Face Transformers, PEFT (LoRA), TRL (SFTTrainer), Weights & Biases
---
## Evaluation Results
### Performance Summary
| Metric | Base Model | Fine-Tuned Model |
|-------------------------|---------------|--------------------|
| Syntax Fix Accuracy | Low | Noticeably Higher |
| Indentation Correction | Inconsistent | Reliable |
| Variable Error Fixing | Occasional | Improved |
| Complex Logic Bugs | Limited | Limited (unchanged)|
| Instruction Adherence | Moderate | High |
> **Note:** Quantitative metrics (e.g., exact match accuracy, CodeBLEU) were not computed due to dataset and tooling constraints.
---
## Example
### Input — Buggy Code
```python
for i in range(5)
print(i)
```
### Output — Fixed Code
```python
for i in range(5):
print(i)
```
---
## Limitations
- Small dataset size limits generalization
- Focused primarily on syntax-level bugs
- Limited performance on complex logical errors
- Not evaluated on large-scale real-world codebases
---
## Discussion
### What Worked Well
- QLoRA enabled efficient fine-tuning on limited hardware
- Significant improvement in syntax correction tasks
- Strong adherence to instruction format
### Challenges
- Limited dataset size
- Lack of quantitative evaluation metrics
- Difficulty handling complex multi-line logic bugs
### Ethical Considerations
- The model may generate incorrect fixes for complex bugs
- Should be used as an assistive tool, not a final authority
- Users should validate outputs before deployment
---
## How to Use
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_model = AutoModelForCausalLM.from_pretrained(
"microsoft/phi-3-mini-4k-instruct"
)
tokenizer = AutoTokenizer.from_pretrained(
"microsoft/phi-3-mini-4k-instruct"
)
model = PeftModel.from_pretrained(
base_model,
"Sud1212/phi3-debug-llm-lora"
)
prompt = "Fix the bug:\nfor i in range(5)\n print(i)"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
---
## Resources
- **GitHub Repository:** [Phi3-debugLLM-LoRA](https://github.com/suddhumaddi/Phi3-debugLLM-LoRA)
- **Weights & Biases Dashboard:** [W&B Project](https://wandb.ai/suddhumaddi-woxsen-university/huggingface)
- **Dataset (MBPP):** [Hugging Face Datasets](https://huggingface.co/datasets/mbpp)
---
## Author
**Sudarshan Maddi**
Woxsen University
---
## License
MIT License