|
|
--- |
|
|
license: mit |
|
|
tags: |
|
|
- codellama |
|
|
- linux |
|
|
- bugfix |
|
|
- lora |
|
|
- qlora |
|
|
- git-diff |
|
|
base_model: codellama/CodeLLaMA-7b-Instruct-hf |
|
|
model_type: LlamaForCausalLM |
|
|
library_name: peft |
|
|
pipeline_tag: text-generation |
|
|
--- |
|
|
|
|
|
# CodeLLaMA-Linux-BugFix |
|
|
|
|
|
A fine-tuned version of `CodeLLaMA-7B-Instruct`, designed specifically for Linux kernel bug fixing using QLoRA (Quantized Low-Rank Adaptation). The model learns to generate Git diff patches based on buggy C code and commit messages. |
|
|
|
|
|
--- |
|
|
|
|
|
## π― Overview |
|
|
|
|
|
This project targets automated Linux kernel bug fixing by: |
|
|
|
|
|
- Mining real commit data from kernel Git history |
|
|
- Training a QLoRA model to generate Git-style fixes |
|
|
- Evaluating performance using BLEU and ROUGE |
|
|
- Supporting integration into code review pipelines |
|
|
|
|
|
--- |
|
|
|
|
|
## π Performance Results |
|
|
|
|
|
**BLEU Score**: 33.87 |
|
|
|
|
|
**ROUGE Scores**: |
|
|
- ROUGE-1: P=0.3775, R=0.7306, F1=0.4355 |
|
|
- ROUGE-2: P=0.2898, R=0.6096, F1=0.3457 |
|
|
- ROUGE-L: P=0.3023, R=0.6333, F1=0.3612 |
|
|
|
|
|
These results show that the model generates high-quality diffs with good semantic similarity to ground-truth patches. |
|
|
|
|
|
--- |
|
|
|
|
|
## π§ Model Configuration |
|
|
|
|
|
- **Base model**: `CodeLLaMA-7B-Instruct` |
|
|
- **Fine-tuning**: QLoRA (LoRA r=64, Ξ±=16, dropout=0.1) |
|
|
- **Quantization**: 4-bit NF4 |
|
|
- **Training**: 3 epochs, batch size 64, LR 2e-4 |
|
|
- **Precision**: bfloat16 with gradient checkpointing |
|
|
- **Hardware**: 1Γ NVIDIA H200 (144 GB VRAM) |
|
|
|
|
|
--- |
|
|
|
|
|
## ποΈ Dataset |
|
|
|
|
|
- 100,000 samples from Linux kernel Git commits |
|
|
- Format: JSONL with `"prompt"` and `"completion"` fields |
|
|
- Content: C code segments + commit messages β Git diffs |
|
|
- Source: Bug-fix commits filtered by keywords like `fix`, `null`, `race`, `panic` |
|
|
|
|
|
--- |
|
|
|
|
|
## π Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
from peft import PeftModel |
|
|
|
|
|
model = AutoModelForCausalLM.from_pretrained("codellama/CodeLLaMA-7b-Instruct-hf") |
|
|
model = PeftModel.from_pretrained(model, "train/output/qlora-codellama-bugfix") |
|
|
tokenizer = AutoTokenizer.from_pretrained("codellama/CodeLLaMA-7b-Instruct-hf") |
|
|
|
|
|
prompt = ''' |
|
|
Given the following original C code: |
|
|
```c |
|
|
if (!file->filter) |
|
|
return; |
|
|
```` |
|
|
|
|
|
Instruction: Fix the null pointer dereference |
|
|
|
|
|
Return the diff that fixes it: |
|
|
''' |
|
|
|
|
|
inputs = tokenizer(prompt, return\_tensors="pt") |
|
|
outputs = model.generate(\*\*inputs, max\_length=512, temperature=0.1) |
|
|
fix = tokenizer.decode(outputs\[0], skip\_special\_tokens=True) |
|
|
print(fix) |
|
|
|
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## π Structure |
|
|
|
|
|
``` |
|
|
|
|
|
CodeLLaMA-Linux-BugFix/ |
|
|
βββ dataset/ # Raw and processed JSONL files |
|
|
βββ dataset\_builder/ # Scripts for mining & formatting commits |
|
|
βββ train/ # Training scripts & checkpoints |
|
|
βββ evaluate/ # Evaluation scripts & results |
|
|
βββ requirements.txt # Dependencies |
|
|
|
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## π Metrics |
|
|
|
|
|
| Metric | Score | |
|
|
|----------|--------| |
|
|
| BLEU | 33.87 | |
|
|
| ROUGE-1 | 0.4355 | |
|
|
| ROUGE-2 | 0.3457 | |
|
|
| ROUGE-L | 0.3612 | |
|
|
|
|
|
--- |
|
|
|
|
|
## π¬ Use Cases |
|
|
|
|
|
- Kernel patch suggestion tools |
|
|
- Code review assistants |
|
|
- Bug localization + repair research |
|
|
- APR benchmarks for kernel code |
|
|
|
|
|
--- |
|
|
|
|
|
## π License |
|
|
|
|
|
MIT License |
|
|
|
|
|
--- |
|
|
|
|
|
## π References |
|
|
|
|
|
- [CodeLLaMA](https://arxiv.org/abs/2308.12950) |
|
|
- [QLoRA](https://arxiv.org/abs/2305.14314) |
|
|
- [LoRA](https://arxiv.org/abs/2106.09685) |