File size: 9,083 Bytes

---

license: mit
tags:
  - codellama
  - linux
  - bugfix
  - lora
  - qlora
  - git-diff
base_model: codellama/CodeLLaMA-7b-Instruct-hf
model_type: LlamaForCausalLM
library_name: peft
pipeline_tag: text-generation
---


# CodeLLaMA-Linux-BugFix

A fine-tuned version of `CodeLLaMA-7B-Instruct`, designed specifically for Linux kernel bug fixing using QLoRA (Quantized Low-Rank Adaptation). The model learns to generate Git diff patches based on buggy C code and commit messages.

---

## 🎯 Overview

This project targets automated Linux kernel bug fixing by:

- **Mining real commit data** from the kernel Git history
- **Training a specialized QLoRA model** on diff-style fixes
- **Generating Git patches** in response to bug-prone code
- **Evaluating results** using BLEU, ROUGE, and human inspection

The model achieves strong performance in generating accurate Linux kernel bug fixes, making it a valuable tool for automated code review and bug detection.

---

## 📊 Performance Results

### Evaluation Metrics

✅ **BLEU Score**: 33.87

✅ **ROUGE Scores**:
- **ROUGE-1**: P=0.3775, R=0.7306, F1=0.4355
- **ROUGE-2**: P=0.2898, R=0.6096, F1=0.3457
- **ROUGE-L**: P=0.3023, R=0.6333, F1=0.3612

These results demonstrate the model's ability to:
- Generate syntactically correct Git diff patches
- Maintain semantic similarity to reference fixes
- Produce meaningful code changes that address the underlying bugs

---

## 🧠 Model Configuration

- **Base model**: `CodeLLaMA-7B-Instruct`
- **Fine-tuning method**: QLoRA with 4-bit quantization
- **Training setup**:
  - LoRA r=64, alpha=16, dropout=0.1
  - Batch size: 64, LR: 2e-4, Epochs: 3
  - Mixed precision (bfloat16), gradient checkpointing
- **Hardware**: Optimized for NVIDIA H200 GPUs

---

## 📊 Dataset

Custom dataset extracted from Linux kernel Git history.

### Filtering Criteria
Bug-fix commits containing:
`fix`, `bug`, `crash`, `memory`, `null`, `panic`, `overflow`, `race`, `corruption`, etc.

### Structure
- Language: C (`.c`, `.h`)
- Context: 10 lines before/after the change
- Format:

```json

{

  "input": {

    "original code": "C code snippet with bug",

    "instruction": "Commit message or fix description"

  },

  "output": {

    "diff codes": "Git diff showing the fix"

  }

}

```

* **File**: `training_data_100k.jsonl` (100,000 samples)

---

## 🚀 Quick Start

### Prerequisites

- Python 3.8+
- CUDA-compatible GPU (recommended)
- 16GB+ RAM
- 50GB+ disk space

### Install dependencies

```bash

pip install -r requirements.txt

```

### 1. Build the Dataset

```bash

cd dataset_builder

python extract_linux_bugfixes_parallel.py

python format_for_training.py

```

### 2. Fine-tune the Model

```bash

cd train

python train_codellama_qlora_linux_bugfix.py

```

### 3. Run Evaluation

```bash

cd evaluate

python evaluate_linux_bugfix_model.py

```

### 4. Use the Model

```python

from transformers import AutoTokenizer, AutoModelForCausalLM

from peft import PeftModel



# Load the fine-tuned model

model = AutoModelForCausalLM.from_pretrained("codellama/CodeLLaMA-7b-Instruct-hf")

model = PeftModel.from_pretrained(model, "train/output/qlora-codellama-bugfix")

tokenizer = AutoTokenizer.from_pretrained("codellama/CodeLLaMA-7b-Instruct-hf")



# Generate a bug fix

prompt = """

Given the following original C code:

```c

if (!file->filter)

    return;

```

Instruction: Fix the null pointer dereference

Return the diff that fixes it:
"""

inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(**inputs, max_length=512, temperature=0.1)

fix = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(fix)

```



---



## 📁 Project Structure



```

CodeLLaMA-Linux-BugFix/

├── dataset_builder/

│   ├── extract_linux_bugfixes_parallel.py    # Parallel extraction of bug fixes

│   ├── format_for_training.py                # Format data for training

│   └── build_dataset.py                      # Main dataset builder

├── dataset/

│   ├── training_data_100k.jsonl              # 100K training samples

│   └── training_data_prompt_completion.jsonl # Formatted training data

├── train/

│   ├── train_codellama_qlora_linux_bugfix.py # Main training script

│   ├── train_codellama_qlora_simple.py       # Simplified training

│   ├── download_codellama_model.py           # Model download utility

│   └── output/

│       └── qlora-codellama-bugfix/           # Trained model checkpoints

├── evaluate/

│   ├── evaluate_linux_bugfix_model.py        # Evaluation script

│   ├── test_samples.jsonl                    # Test dataset

│   └── output/                               # Evaluation results

│       ├── eval_results.csv                  # Detailed results

│       └── eval_results.json                 # JSON format results

├── requirements.txt                          # Python dependencies

├── README.md                                 # This file

└── PROJECT_STRUCTURE.md                      # Detailed project overview

```



---



## 🧩 Features



* 🔧 **Efficient Fine-tuning**: QLoRA + 4-bit quant = massive memory savings

* 🧠 **Real-world commits**: From actual Linux kernel development

* 💡 **Context-aware**: Code context extraction around bug lines

* 💻 **Output-ready**: Generates valid Git-style diffs

* 📈 **Strong Performance**: BLEU score of 33.87 with good ROUGE metrics

* 🚀 **Production-ready**: Optimized for real-world deployment



---



## 📈 Evaluation Metrics



* **BLEU**: Translation-style match to reference diffs

* **ROUGE**: Overlap in fix content and semantic similarity

* **Human Evaluation**: Subjective patch quality assessment



### Current Performance

- **BLEU Score**: 33.87 (excellent for code generation tasks)

- **ROUGE-1 F1**: 0.4355 (good semantic overlap)

- **ROUGE-2 F1**: 0.3457 (reasonable bigram matching)

- **ROUGE-L F1**: 0.3612 (good longest common subsequence)



---



## 🧪 Use Cases



* **Automated kernel bug fixing**: Generate fixes for common kernel bugs

* **Code review assistance**: Help reviewers identify potential issues

* **Teaching/debugging kernel code**: Educational tool for kernel development

* **Research in automated program repair (APR)**: Academic research applications

* **CI/CD integration**: Automated testing and fixing in development pipelines



---



## 🔬 Technical Highlights



### Memory & Speed Optimizations



* 4-bit quantization (NF4)

* Gradient checkpointing

* Mixed precision (bfloat16)

* Gradient accumulation

* LoRA parameter efficiency



### Training Efficiency



* **QLoRA**: Reduces memory usage by ~75%

* **4-bit quantization**: Further memory optimization

* **Gradient checkpointing**: Trades compute for memory

* **Mixed precision**: Faster training with maintained accuracy



---



## 🛠️ Advanced Usage



### Custom Training



```bash

# Train with custom parameters

python train_codellama_qlora_linux_bugfix.py \

    --learning_rate 1e-4 \

    --num_epochs 5 \

    --batch_size 32 \

    --lora_r 32 \

    --lora_alpha 16

```



### Evaluation on Custom Data



```bash

# Evaluate on your own test set

python evaluate_linux_bugfix_model.py \

    --test_file your_test_data.jsonl \

    --output_dir custom_eval_results

```



---



## 🤝 Contributing



1. Fork this repo

2. Create a feature branch (`git checkout -b feature/amazing-feature`)

3. Commit your changes (`git commit -m 'Add amazing feature'`)

4. Push to the branch (`git push origin feature/amazing-feature`)

5. Open a Pull Request 🙌



### Development Guidelines



- Follow PEP 8 style guidelines

- Add tests for new features

- Update documentation for API changes

- Ensure all tests pass before submitting PR



---



## 📄 License



MIT License – see `LICENSE` file for details.



---



## 🙏 Acknowledgments



* **Meta** for CodeLLaMA base model

* **Hugging Face** for Transformers + PEFT libraries

* **The Linux kernel community** for open access to commit data

* **Microsoft** for introducing LoRA technique

* **University of Washington** for QLoRA research



---



## 📚 References



* [CodeLLaMA (Meta, 2023)](https://arxiv.org/abs/2308.12950)

* [QLoRA (Dettmers et al., 2023)](https://arxiv.org/abs/2305.14314)

* [LoRA (Hu et al., 2021)](https://arxiv.org/abs/2106.09685)

* [Automated Program Repair: A Survey](https://ieeexplore.ieee.org/document/8449519)



---



## 📞 Support



For questions, issues, or contributions:

- Open an issue on GitHub

- Check the project documentation

- Review the evaluation results in `evaluate/output/`



---



## 🔄 Version History



- **v1.0.0**: Initial release with QLoRA training

- **v1.1.0**: Added parallel dataset extraction

- **v1.2.0**: Improved evaluation metrics and documentation