--- license: mit base_model: microsoft/phi-3-mini-4k-instruct tags: - llm - code-generation - bug-fixing - lora - peft - python datasets: - mbpp metrics: - exact_match - similarity --- # DebugGPT LoRA Adapter for Phi-3 Mini A lightweight LoRA adapter fine-tuned on synthetic Python bug-fixing tasks using the MBPP dataset. This model enhances the ability of Phi-3 Mini to detect and correct common Python syntax errors while preserving general language capabilities. --- ## Model Description - **Base Model:** microsoft/phi-3-mini-4k-instruct - **Fine-Tuning Method:** QLoRA (Low-Rank Adaptation with 4-bit quantization) - **Task:** Automated Python bug fixing The model takes buggy Python code as input and generates the corrected version. --- ## Intended Use This model is designed for: - Python debugging assistance - Educational coding tools - AI-assisted code correction - Research experiments in code repair ### Out-of-Scope Use - Production-critical systems - Security-sensitive applications - Complex multi-file debugging --- ## Dataset We use the **MBPP (Mostly Basic Python Problems)** dataset. Since MBPP contains correct code, we generate a bug-fixing dataset by injecting synthetic bugs. ### Data Format Each example follows an instruction-tuning format: ```json { "instruction": "Fix the bug in the following Python code", "input": "", "output": "" } ``` ### Bug Injection Strategy We introduce controlled bugs such as: - Operator replacement (`+` → `-`) - Comparison changes (`>` → `<`) - Removal of return statements ### Dataset Size | Split | Samples | |------------|---------| | Train | ~374 | | Validation | ~90 | | Test | ~500 | --- ## Training Procedure ### Method: QLoRA To enable efficient training on limited hardware: - Base model loaded in 4-bit precision (NF4) - Base weights frozen - Only LoRA adapters trained ### LoRA Configuration | Parameter | Value | |-----------------|------------------------------------| | Rank (r) | 16 | | Alpha | 32 | | Dropout | 0.05 | | Target Modules | q_proj, k_proj, v_proj, o_proj | ### Training Configuration | Parameter | Value | |------------------------|---------| | Epochs | 3 | | Learning Rate | 2e-4 | | Batch Size | 1 | | Gradient Accumulation | 8 | | Precision | FP16 | | Optimizer | AdamW | --- ## Hardware & Frameworks - **GPU:** NVIDIA Tesla T4 - **Frameworks:** Hugging Face Transformers, PEFT (LoRA), TRL (SFTTrainer), Weights & Biases --- ## Evaluation Results ### Performance Summary | Metric | Base Model | Fine-Tuned Model | |-------------------------|---------------|--------------------| | Syntax Fix Accuracy | Low | Noticeably Higher | | Indentation Correction | Inconsistent | Reliable | | Variable Error Fixing | Occasional | Improved | | Complex Logic Bugs | Limited | Limited (unchanged)| | Instruction Adherence | Moderate | High | > **Note:** Quantitative metrics (e.g., exact match accuracy, CodeBLEU) were not computed due to dataset and tooling constraints. --- ## Example ### Input — Buggy Code ```python for i in range(5) print(i) ``` ### Output — Fixed Code ```python for i in range(5): print(i) ``` --- ## Limitations - Small dataset size limits generalization - Focused primarily on syntax-level bugs - Limited performance on complex logical errors - Not evaluated on large-scale real-world codebases --- ## Discussion ### What Worked Well - QLoRA enabled efficient fine-tuning on limited hardware - Significant improvement in syntax correction tasks - Strong adherence to instruction format ### Challenges - Limited dataset size - Lack of quantitative evaluation metrics - Difficulty handling complex multi-line logic bugs ### Ethical Considerations - The model may generate incorrect fixes for complex bugs - Should be used as an assistive tool, not a final authority - Users should validate outputs before deployment --- ## How to Use ```python from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel base_model = AutoModelForCausalLM.from_pretrained( "microsoft/phi-3-mini-4k-instruct" ) tokenizer = AutoTokenizer.from_pretrained( "microsoft/phi-3-mini-4k-instruct" ) model = PeftModel.from_pretrained( base_model, "Sud1212/phi3-debug-llm-lora" ) prompt = "Fix the bug:\nfor i in range(5)\n print(i)" inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=100) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` --- ## Resources - **GitHub Repository:** [Phi3-debugLLM-LoRA](https://github.com/suddhumaddi/Phi3-debugLLM-LoRA) - **Weights & Biases Dashboard:** [W&B Project](https://wandb.ai/suddhumaddi-woxsen-university/huggingface) - **Dataset (MBPP):** [Hugging Face Datasets](https://huggingface.co/datasets/mbpp) --- ## Author **Sudarshan Maddi** Woxsen University --- ## License MIT License