| | --- |
| | base_model: microsoft/Phi-3-mini-4k-instruct |
| | library_name: peft |
| | datasets: |
| | - mbpp |
| | - google/code_x_glue_ct_code_to_text |
| | language: |
| | - en |
| | - py |
| | tags: |
| | - code-generation |
| | - docstring-generation |
| | - code-review |
| | - bilora |
| | --- |
| | |
| | # Phi-3 BiLoRA Code Review |
| |
|
| | This model is a fine-tuned version of `microsoft/Phi-3-mini-4k-instruct` using BiLoRA (Dual-Adapter LoRA) for code review tasks, specifically code generation and docstring generation. |
| |
|
| | ## Model Details |
| |
|
| | - **Model Type:** Causal Language Model with multiple LoRA adapters |
| | - **Base Model:** [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) |
| | - **Adapters:** |
| | - `task_1`: Code Generation (fine-tuned on MBPP) |
| | - `task_2`: Docstring Generation (fine-tuned on CodeXGLUE) |
| | - **Language(s):** Python |
| |
|
| | ## Intended Use |
| |
|
| | This model is intended for code review assistance, including: |
| | - Generating Python code from natural language prompts. |
| | - Generating descriptive docstrings for existing Python functions. |
| |
|
| | ## Training Details |
| |
|
| | ### Dataset |
| | - **Task 1:** [MBPP](https://huggingface.co/datasets/mbpp) (Mostly Basic Python Problems) |
| | - **Task 2:** [CodeXGLUE (ct-code-to-text)](https://huggingface.co/datasets/google/code_x_glue_ct_code_to_text) (Python subset) |
| |
|
| | ### BiLoRA Configuration |
| | - **Rank (r):** 4 |
| | - **Alpha:** 8 |
| | - **Dropout:** 0.1 |
| | - **Target Modules:** |
| | - `task_1`: `qkv_proj`, `o_proj` |
| | - `task_2`: `gate_up_proj`, `down_proj` |
| |
|
| | ### Hyperparameters |
| | - **Learning Rate:** 2e-4 |
| | - **Batch Size:** 1 (Gradient Accumulation Steps: 16) |
| | - **Epochs:** 1 |
| | - **Optimizer:** AdamW |
| | - **LR Scheduler:** Linear |
| |
|
| | ## Benchmark Results |
| |
|
| | Evaluation performed on a custom benchmark of 20 samples (10 code gen, 10 docstring gen). |
| |
|
| | | Model | Bug Detection (Pass@1) | Localization (BLEU) | Fix Quality (1-5) | Latency (avg) | |
| | |-------|--------------|--------------|-------------|---------| |
| | | BiLoRA (mine) | 94.17% | 0.0259 | 3.7/5 | 33499ms | |
| | | Phi-3 base | 70.0% | 0.0536 | 3.6/5 | 24561ms | |
| | | GPT-4 (Groq) | 100.0% | 0.1255 | 4.4/5 | 433ms | |
| |
|
| | *Note: Bug Detection is proxied by Code Generation Pass Rate. Localization is proxied by Docstring BLEU score. Fix Quality is an average quality score (1-5).* |
| |
|
| | ## Example Usage |
| |
|
| | ```python |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | from peft import PeftModel |
| | |
| | base_model = "microsoft/Phi-3-mini-4k-instruct" |
| | model = AutoModelForCausalLM.from_pretrained(base_model, trust_remote_code=True) |
| | model = PeftModel.from_pretrained(model, "aniketp2009gmail/phi3-bilora-code-review") |
| | |
| | tokenizer = AutoTokenizer.from_pretrained("aniketp2009gmail/phi3-bilora-code-review") |
| | |
| | # For Code Generation (Task 1) |
| | model.set_adapter("task_1") |
| | prompt = "Generate code: Write a function to find the sum of even numbers in a list\nCode:" |
| | inputs = tokenizer(prompt, return_tensors="pt") |
| | outputs = model.generate(**inputs, max_new_tokens=100) |
| | print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
| | |
| | # For Docstring Generation (Task 2) |
| | model.set_adapter("task_2") |
| | prompt = "Generate docstring: def sum_even(lst):\n return sum(x for x in lst if x % 2 == 0)\nDocstring:" |
| | inputs = tokenizer(prompt, return_tensors="pt") |
| | outputs = model.generate(**inputs, max_new_tokens=100) |
| | print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
| | ``` |
| |
|
| | ## Limitations |
| |
|
| | - The model is optimized for Python. |
| | - Performance may vary on complex or niche libraries. |
| | - Latency is higher than quantized or distilled models. |
| |
|