---
base_model: microsoft/Phi-3-mini-4k-instruct
library_name: peft
datasets:
- mbpp
- google/code_x_glue_ct_code_to_text
language:
- en
- py
tags:
- code-generation
- docstring-generation
- code-review
- bilora
---

# Phi-3 BiLoRA Code Review

This model is a fine-tuned version of `microsoft/Phi-3-mini-4k-instruct` using BiLoRA (Dual-Adapter LoRA) for code review tasks, specifically code generation and docstring generation.

## Model Details

- **Model Type:** Causal Language Model with multiple LoRA adapters
- **Base Model:** [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct)
- **Adapters:**
  - `task_1`: Code Generation (fine-tuned on MBPP)
  - `task_2`: Docstring Generation (fine-tuned on CodeXGLUE)
- **Language(s):** Python

## Intended Use

This model is intended for code review assistance, including:
- Generating Python code from natural language prompts.
- Generating descriptive docstrings for existing Python functions.

## Training Details

### Dataset
- **Task 1:** [MBPP](https://huggingface.co/datasets/mbpp) (Mostly Basic Python Problems)
- **Task 2:** [CodeXGLUE (ct-code-to-text)](https://huggingface.co/datasets/google/code_x_glue_ct_code_to_text) (Python subset)

### BiLoRA Configuration
- **Rank (r):** 4
- **Alpha:** 8
- **Dropout:** 0.1
- **Target Modules:**
  - `task_1`: `qkv_proj`, `o_proj`
  - `task_2`: `gate_up_proj`, `down_proj`

### Hyperparameters
- **Learning Rate:** 2e-4
- **Batch Size:** 1 (Gradient Accumulation Steps: 16)
- **Epochs:** 1
- **Optimizer:** AdamW
- **LR Scheduler:** Linear

## Benchmark Results

Evaluation performed on a custom benchmark of 20 samples (10 code gen, 10 docstring gen).

| Model | Bug Detection (Pass@1) | Localization (BLEU) | Fix Quality (1-5) | Latency (avg) |
|-------|--------------|--------------|-------------|---------|
| BiLoRA (mine) | 94.17% | 0.0259 | 3.7/5 | 33499ms |
| Phi-3 base | 70.0% | 0.0536 | 3.6/5 | 24561ms |
| GPT-4 (Groq) | 100.0% | 0.1255 | 4.4/5 | 433ms |

*Note: Bug Detection is proxied by Code Generation Pass Rate. Localization is proxied by Docstring BLEU score. Fix Quality is an average quality score (1-5).*

## Example Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = "microsoft/Phi-3-mini-4k-instruct"
model = AutoModelForCausalLM.from_pretrained(base_model, trust_remote_code=True)
model = PeftModel.from_pretrained(model, "aniketp2009gmail/phi3-bilora-code-review")

tokenizer = AutoTokenizer.from_pretrained("aniketp2009gmail/phi3-bilora-code-review")

# For Code Generation (Task 1)
model.set_adapter("task_1")
prompt = "Generate code: Write a function to find the sum of even numbers in a list\nCode:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

# For Docstring Generation (Task 2)
model.set_adapter("task_2")
prompt = "Generate docstring: def sum_even(lst):\n    return sum(x for x in lst if x % 2 == 0)\nDocstring:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## Limitations

- The model is optimized for Python.
- Performance may vary on complex or niche libraries.
- Latency is higher than quantized or distilled models.