| --- | |
| license: mit | |
| tags: | |
| - codellama | |
| - linux | |
| - bugfix | |
| - lora | |
| - qlora | |
| - git-diff | |
| base_model: codellama/CodeLLaMA-7b-Instruct-hf | |
| model_type: LlamaForCausalLM | |
| library_name: peft | |
| pipeline_tag: text-generation | |
| --- | |
| # CodeLLaMA-Linux-BugFix | |
| A fine-tuned version of `CodeLLaMA-7B-Instruct`, designed specifically for Linux kernel bug fixing using QLoRA (Quantized Low-Rank Adaptation). The model learns to generate Git diff patches based on buggy C code and commit messages. | |
| --- | |
| ## π― Overview | |
| This project targets automated Linux kernel bug fixing by: | |
| - **Mining real commit data** from the kernel Git history | |
| - **Training a specialized QLoRA model** on diff-style fixes | |
| - **Generating Git patches** in response to bug-prone code | |
| - **Evaluating results** using BLEU, ROUGE, and human inspection | |
| The model achieves strong performance in generating accurate Linux kernel bug fixes, making it a valuable tool for automated code review and bug detection. | |
| --- | |
| ## π Performance Results | |
| ### Evaluation Metrics | |
| β **BLEU Score**: 33.87 | |
| β **ROUGE Scores**: | |
| - **ROUGE-1**: P=0.3775, R=0.7306, F1=0.4355 | |
| - **ROUGE-2**: P=0.2898, R=0.6096, F1=0.3457 | |
| - **ROUGE-L**: P=0.3023, R=0.6333, F1=0.3612 | |
| These results demonstrate the model's ability to: | |
| - Generate syntactically correct Git diff patches | |
| - Maintain semantic similarity to reference fixes | |
| - Produce meaningful code changes that address the underlying bugs | |
| --- | |
| ## π§ Model Configuration | |
| - **Base model**: `CodeLLaMA-7B-Instruct` | |
| - **Fine-tuning method**: QLoRA with 4-bit quantization | |
| - **Training setup**: | |
| - LoRA r=64, alpha=16, dropout=0.1 | |
| - Batch size: 64, LR: 2e-4, Epochs: 3 | |
| - Mixed precision (bfloat16), gradient checkpointing | |
| - **Hardware**: Optimized for NVIDIA H200 GPUs | |
| --- | |
| ## π Dataset | |
| Custom dataset extracted from Linux kernel Git history. | |
| ### Filtering Criteria | |
| Bug-fix commits containing: | |
| `fix`, `bug`, `crash`, `memory`, `null`, `panic`, `overflow`, `race`, `corruption`, etc. | |
| ### Structure | |
| - Language: C (`.c`, `.h`) | |
| - Context: 10 lines before/after the change | |
| - Format: | |
| ```json | |
| { | |
| "input": { | |
| "original code": "C code snippet with bug", | |
| "instruction": "Commit message or fix description" | |
| }, | |
| "output": { | |
| "diff codes": "Git diff showing the fix" | |
| } | |
| } | |
| ``` | |
| * **File**: `training_data_100k.jsonl` (100,000 samples) | |
| --- | |
| ## π Quick Start | |
| ### Prerequisites | |
| - Python 3.8+ | |
| - CUDA-compatible GPU (recommended) | |
| - 16GB+ RAM | |
| - 50GB+ disk space | |
| ### Install dependencies | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| ### 1. Build the Dataset | |
| ```bash | |
| cd dataset_builder | |
| python extract_linux_bugfixes_parallel.py | |
| python format_for_training.py | |
| ``` | |
| ### 2. Fine-tune the Model | |
| ```bash | |
| cd train | |
| python train_codellama_qlora_linux_bugfix.py | |
| ``` | |
| ### 3. Run Evaluation | |
| ```bash | |
| cd evaluate | |
| python evaluate_linux_bugfix_model.py | |
| ``` | |
| ### 4. Use the Model | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForCausalLM | |
| from peft import PeftModel | |
| # Load the fine-tuned model | |
| model = AutoModelForCausalLM.from_pretrained("codellama/CodeLLaMA-7b-Instruct-hf") | |
| model = PeftModel.from_pretrained(model, "train/output/qlora-codellama-bugfix") | |
| tokenizer = AutoTokenizer.from_pretrained("codellama/CodeLLaMA-7b-Instruct-hf") | |
| # Generate a bug fix | |
| prompt = """ | |
| Given the following original C code: | |
| if (!file->filter) | |
| return; | |
| Instruction: Fix the null pointer dereference | |
| Return the diff that fixes it: | |
| """ | |
| inputs = tokenizer(prompt, return_tensors="pt") | |
| outputs = model.generate(**inputs, max_length=512, temperature=0.1) | |
| fix = tokenizer.decode(outputs[0], skip_special_tokens=True) | |
| print(fix) | |
| ``` | |
| --- | |
| ## π Project Structure | |
| ``` | |
| CodeLLaMA-Linux-BugFix/ | |
| βββ dataset_builder/ | |
| β βββ extract_linux_bugfixes_parallel.py # Parallel extraction of bug fixes | |
| β βββ format_for_training.py # Format data for training | |
| β βββ build_dataset.py # Main dataset builder | |
| βββ dataset/ | |
| β βββ training_data_100k.jsonl # 100K training samples | |
| β βββ training_data_prompt_completion.jsonl # Formatted training data | |
| βββ train/ | |
| β βββ train_codellama_qlora_linux_bugfix.py # Main training script | |
| β βββ train_codellama_qlora_simple.py # Simplified training | |
| β βββ download_codellama_model.py # Model download utility | |
| β βββ output/ | |
| β βββ qlora-codellama-bugfix/ # Trained model checkpoints | |
| βββ evaluate/ | |
| β βββ evaluate_linux_bugfix_model.py # Evaluation script | |
| β βββ test_samples.jsonl # Test dataset | |
| β βββ output/ # Evaluation results | |
| β βββ eval_results.csv # Detailed results | |
| β βββ eval_results.json # JSON format results | |
| βββ requirements.txt # Python dependencies | |
| βββ README.md # This file | |
| βββ PROJECT_STRUCTURE.md # Detailed project overview | |
| ``` | |
| --- | |
| ## π§© Features | |
| * π§ **Efficient Fine-tuning**: QLoRA + 4-bit quant = massive memory savings | |
| * π§ **Real-world commits**: From actual Linux kernel development | |
| * π‘ **Context-aware**: Code context extraction around bug lines | |
| * π» **Output-ready**: Generates valid Git-style diffs | |
| * π **Strong Performance**: BLEU score of 33.87 with good ROUGE metrics | |
| * π **Production-ready**: Optimized for real-world deployment | |
| --- | |
| ## π Evaluation Metrics | |
| * **BLEU**: Translation-style match to reference diffs | |
| * **ROUGE**: Overlap in fix content and semantic similarity | |
| * **Human Evaluation**: Subjective patch quality assessment | |
| ### Current Performance | |
| - **BLEU Score**: 33.87 (excellent for code generation tasks) | |
| - **ROUGE-1 F1**: 0.4355 (good semantic overlap) | |
| - **ROUGE-2 F1**: 0.3457 (reasonable bigram matching) | |
| - **ROUGE-L F1**: 0.3612 (good longest common subsequence) | |
| --- | |
| ## π§ͺ Use Cases | |
| * **Automated kernel bug fixing**: Generate fixes for common kernel bugs | |
| * **Code review assistance**: Help reviewers identify potential issues | |
| * **Teaching/debugging kernel code**: Educational tool for kernel development | |
| * **Research in automated program repair (APR)**: Academic research applications | |
| * **CI/CD integration**: Automated testing and fixing in development pipelines | |
| --- | |
| ## π¬ Technical Highlights | |
| ### Memory & Speed Optimizations | |
| * 4-bit quantization (NF4) | |
| * Gradient checkpointing | |
| * Mixed precision (bfloat16) | |
| * Gradient accumulation | |
| * LoRA parameter efficiency | |
| ### Training Efficiency | |
| * **QLoRA**: Reduces memory usage by ~75% | |
| * **4-bit quantization**: Further memory optimization | |
| * **Gradient checkpointing**: Trades compute for memory | |
| * **Mixed precision**: Faster training with maintained accuracy | |
| --- | |
| ## π οΈ Advanced Usage | |
| ### Custom Training | |
| ```bash | |
| # Train with custom parameters | |
| python train_codellama_qlora_linux_bugfix.py \ | |
| --learning_rate 1e-4 \ | |
| --num_epochs 5 \ | |
| --batch_size 32 \ | |
| --lora_r 32 \ | |
| --lora_alpha 16 | |
| ``` | |
| ### Evaluation on Custom Data | |
| ```bash | |
| # Evaluate on your own test set | |
| python evaluate_linux_bugfix_model.py \ | |
| --test_file your_test_data.jsonl \ | |
| --output_dir custom_eval_results | |
| ``` | |
| --- | |
| ## π€ Contributing | |
| 1. Fork this repo | |
| 2. Create a feature branch (`git checkout -b feature/amazing-feature`) | |
| 3. Commit your changes (`git commit -m 'Add amazing feature'`) | |
| 4. Push to the branch (`git push origin feature/amazing-feature`) | |
| 5. Open a Pull Request π | |
| ### Development Guidelines | |
| - Follow PEP 8 style guidelines | |
| - Add tests for new features | |
| - Update documentation for API changes | |
| - Ensure all tests pass before submitting PR | |
| --- | |
| ## π License | |
| MIT License β see `LICENSE` file for details. | |
| --- | |
| ## π Acknowledgments | |
| * **Meta** for CodeLLaMA base model | |
| * **Hugging Face** for Transformers + PEFT libraries | |
| * **The Linux kernel community** for open access to commit data | |
| * **Microsoft** for introducing LoRA technique | |
| * **University of Washington** for QLoRA research | |
| --- | |
| ## π References | |
| * [CodeLLaMA (Meta, 2023)](https://arxiv.org/abs/2308.12950) | |
| * [QLoRA (Dettmers et al., 2023)](https://arxiv.org/abs/2305.14314) | |
| * [LoRA (Hu et al., 2021)](https://arxiv.org/abs/2106.09685) | |
| * [Automated Program Repair: A Survey](https://ieeexplore.ieee.org/document/8449519) | |
| --- | |
| ## π Support | |
| For questions, issues, or contributions: | |
| - Open an issue on GitHub | |
| - Check the project documentation | |
| - Review the evaluation results in `evaluate/output/` | |
| --- | |
| ## π Version History | |
| - **v1.0.0**: Initial release with QLoRA training | |
| - **v1.1.0**: Added parallel dataset extraction | |
| - **v1.2.0**: Improved evaluation metrics and documentation | |