|
|
--- |
|
|
license: apache-2.0 |
|
|
base_model: Qwen/Qwen2.5-Coder-7B-Instruct |
|
|
tags: |
|
|
- code-review |
|
|
- python |
|
|
- qwen2 |
|
|
- fine-tuned |
|
|
datasets: |
|
|
- custom |
|
|
language: |
|
|
- en |
|
|
pipeline_tag: text-generation |
|
|
--- |
|
|
|
|
|
# Code Review Critic |
|
|
|
|
|
A fine-tuned Qwen2.5-Coder-7B-Instruct model for Python code review. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This model provides constructive, actionable feedback on Python code. It focuses on: |
|
|
- Bug detection |
|
|
- Potential issues |
|
|
- Code quality improvements |
|
|
|
|
|
**Base Model:** Qwen/Qwen2.5-Coder-7B-Instruct |
|
|
**Fine-tuning Method:** QLoRA (4-bit quantization + LoRA adapters) |
|
|
**Training Data:** 8,275 real GitHub PR review comments from major Python projects |
|
|
|
|
|
## Training Details |
|
|
|
|
|
- **LoRA Rank:** 64 |
|
|
- **LoRA Alpha:** 64 |
|
|
- **Learning Rate:** 2e-4 |
|
|
- **Epochs:** 2 |
|
|
- **Final Eval Loss:** 0.8455 |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
|
|
model = AutoModelForCausalLM.from_pretrained("YOUR_USERNAME/code-review-critic") |
|
|
tokenizer = AutoTokenizer.from_pretrained("YOUR_USERNAME/code-review-critic") |
|
|
|
|
|
messages = [ |
|
|
{"role": "system", "content": "You are an expert code reviewer. Analyze the provided Python code and give constructive, specific feedback."}, |
|
|
{"role": "user", "content": "Review this Python code:\n\n```python\ndef get_user(id):\n return db.query(f'SELECT * FROM users WHERE id = {id}')\n```"} |
|
|
] |
|
|
|
|
|
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
|
|
inputs = tokenizer(text, return_tensors="pt") |
|
|
outputs = model.generate(**inputs, max_new_tokens=512) |
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |