--- base_model: microsoft/Phi-3-mini-4k-instruct library_name: peft datasets: - mbpp - google/code_x_glue_ct_code_to_text language: - en - py tags: - code-generation - docstring-generation - code-review - bilora --- # Phi-3 BiLoRA Code Review This model is a fine-tuned version of `microsoft/Phi-3-mini-4k-instruct` using BiLoRA (Dual-Adapter LoRA) for code review tasks, specifically code generation and docstring generation. ## Model Details - **Model Type:** Causal Language Model with multiple LoRA adapters - **Base Model:** [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) - **Adapters:** - `task_1`: Code Generation (fine-tuned on MBPP) - `task_2`: Docstring Generation (fine-tuned on CodeXGLUE) - **Language(s):** Python ## Intended Use This model is intended for code review assistance, including: - Generating Python code from natural language prompts. - Generating descriptive docstrings for existing Python functions. ## Training Details ### Dataset - **Task 1:** [MBPP](https://huggingface.co/datasets/mbpp) (Mostly Basic Python Problems) - **Task 2:** [CodeXGLUE (ct-code-to-text)](https://huggingface.co/datasets/google/code_x_glue_ct_code_to_text) (Python subset) ### BiLoRA Configuration - **Rank (r):** 4 - **Alpha:** 8 - **Dropout:** 0.1 - **Target Modules:** - `task_1`: `qkv_proj`, `o_proj` - `task_2`: `gate_up_proj`, `down_proj` ### Hyperparameters - **Learning Rate:** 2e-4 - **Batch Size:** 1 (Gradient Accumulation Steps: 16) - **Epochs:** 1 - **Optimizer:** AdamW - **LR Scheduler:** Linear ## Benchmark Results Evaluation performed on a custom benchmark of 20 samples (10 code gen, 10 docstring gen). | Model | Bug Detection (Pass@1) | Localization (BLEU) | Fix Quality (1-5) | Latency (avg) | |-------|--------------|--------------|-------------|---------| | BiLoRA (mine) | 94.17% | 0.0259 | 3.7/5 | 33499ms | | Phi-3 base | 70.0% | 0.0536 | 3.6/5 | 24561ms | | GPT-4 (Groq) | 100.0% | 0.1255 | 4.4/5 | 433ms | *Note: Bug Detection is proxied by Code Generation Pass Rate. Localization is proxied by Docstring BLEU score. Fix Quality is an average quality score (1-5).* ## Example Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel base_model = "microsoft/Phi-3-mini-4k-instruct" model = AutoModelForCausalLM.from_pretrained(base_model, trust_remote_code=True) model = PeftModel.from_pretrained(model, "aniketp2009gmail/phi3-bilora-code-review") tokenizer = AutoTokenizer.from_pretrained("aniketp2009gmail/phi3-bilora-code-review") # For Code Generation (Task 1) model.set_adapter("task_1") prompt = "Generate code: Write a function to find the sum of even numbers in a list\nCode:" inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=100) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) # For Docstring Generation (Task 2) model.set_adapter("task_2") prompt = "Generate docstring: def sum_even(lst):\n return sum(x for x in lst if x % 2 == 0)\nDocstring:" inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=100) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## Limitations - The model is optimized for Python. - Performance may vary on complex or niche libraries. - Latency is higher than quantized or distilled models.