aniketp2009gmail
/

phi3-bilora-code-review

code-generation

docstring-generation

Model card Files Files and versions

phi3-bilora-code-review / README.md

aniketp2009gmail's picture

aniketp2009gmail

Upload README.md with huggingface_hub

5864958 verified about 1 month ago

|

history blame contribute delete

3.41 kB

	---
	base_model: microsoft/Phi-3-mini-4k-instruct
	library_name: peft
	datasets:
	- mbpp
	- google/code_x_glue_ct_code_to_text
	language:
	- en
	- py
	tags:
	- code-generation
	- docstring-generation
	- code-review
	- bilora
	---

	# Phi-3 BiLoRA Code Review

	This model is a fine-tuned version of `microsoft/Phi-3-mini-4k-instruct` using BiLoRA (Dual-Adapter LoRA) for code review tasks, specifically code generation and docstring generation.

	## Model Details

	- Model Type: Causal Language Model with multiple LoRA adapters
	- Base Model: [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct)
	- Adapters:
	- `task_1`: Code Generation (fine-tuned on MBPP)
	- `task_2`: Docstring Generation (fine-tuned on CodeXGLUE)
	- Language(s): Python

	## Intended Use

	This model is intended for code review assistance, including:
	- Generating Python code from natural language prompts.
	- Generating descriptive docstrings for existing Python functions.

	## Training Details

	### Dataset
	- Task 1: [MBPP](https://huggingface.co/datasets/mbpp) (Mostly Basic Python Problems)
	- Task 2: [CodeXGLUE (ct-code-to-text)](https://huggingface.co/datasets/google/code_x_glue_ct_code_to_text) (Python subset)

	### BiLoRA Configuration
	- Rank (r): 4
	- Alpha: 8
	- Dropout: 0.1
	- Target Modules:
	- `task_1`: `qkv_proj`, `o_proj`
	- `task_2`: `gate_up_proj`, `down_proj`

	### Hyperparameters
	- Learning Rate: 2e-4
	- Batch Size: 1 (Gradient Accumulation Steps: 16)
	- Epochs: 1
	- Optimizer: AdamW
	- LR Scheduler: Linear

	## Benchmark Results

	Evaluation performed on a custom benchmark of 20 samples (10 code gen, 10 docstring gen).

	\| Model \| Bug Detection (Pass@1) \| Localization (BLEU) \| Fix Quality (1-5) \| Latency (avg) \|
	\|-------\|--------------\|--------------\|-------------\|---------\|
	\| BiLoRA (mine) \| 94.17% \| 0.0259 \| 3.7/5 \| 33499ms \|
	\| Phi-3 base \| 70.0% \| 0.0536 \| 3.6/5 \| 24561ms \|
	\| GPT-4 (Groq) \| 100.0% \| 0.1255 \| 4.4/5 \| 433ms \|

	Note: Bug Detection is proxied by Code Generation Pass Rate. Localization is proxied by Docstring BLEU score. Fix Quality is an average quality score (1-5).

	## Example Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel

	base_model = "microsoft/Phi-3-mini-4k-instruct"
	model = AutoModelForCausalLM.from_pretrained(base_model, trust_remote_code=True)
	model = PeftModel.from_pretrained(model, "aniketp2009gmail/phi3-bilora-code-review")

	tokenizer = AutoTokenizer.from_pretrained("aniketp2009gmail/phi3-bilora-code-review")

	# For Code Generation (Task 1)
	model.set_adapter("task_1")
	prompt = "Generate code: Write a function to find the sum of even numbers in a list\nCode:"
	inputs = tokenizer(prompt, return_tensors="pt")
	outputs = model.generate(**inputs, max_new_tokens=100)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))

	# For Docstring Generation (Task 2)
	model.set_adapter("task_2")
	prompt = "Generate docstring: def sum_even(lst):\n return sum(x for x in lst if x % 2 == 0)\nDocstring:"
	inputs = tokenizer(prompt, return_tensors="pt")
	outputs = model.generate(**inputs, max_new_tokens=100)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	## Limitations

	- The model is optimized for Python.
	- Performance may vary on complex or niche libraries.
	- Latency is higher than quantized or distilled models.