File size: 3,724 Bytes
cb50faf 01eb821 cb50faf 01eb821 cb50faf 01eb821 cb50faf 01eb821 cb50faf 01eb821 cb50faf 01eb821 cb50faf 01eb821 cb50faf |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 |
---
language: en
license: mit
tags:
- code
- git
- commit-message
- qwen2
- lora
datasets:
- bigcode/commitpackft
---
# Git Commit Message Generator
Fine-tuned Qwen-0.5B model for generating professional Git commit messages from code diffs.
## Model Description
This model was fine-tuned using LoRA (Low-Rank Adaptation) on the CommitPackFT dataset to generate concise, professional commit messages from git diffs.
**Base Model**: Qwen-0.5B
**Fine-tuning Method**: LoRA (r=16, alpha=32)
**Training Data**: 55K filtered commits from CommitPackFT
**Languages**: Python, JavaScript, TypeScript, Java, C++, Go, Rust, and more
## Intended Use
Generate commit messages for staged changes in a Git repository.
### Quick Start
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load model and tokenizer
model_name = "rajtiwariee/auto-commit"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)
# Prepare your diff
diff = """
Diff:
File: src/auth.py
Language: Python
Old content:
def login(username, password):
user = get_user(username)
if user.password == password:
return True
return False
New content:
def login(username, password):
user = get_user(username)
if user and user.password == password:
return True
return False
"""
# Generate commit message
prompt = f"Write a git commit message:\n\n{diff}\n\nCommit message:\n"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=30,
do_sample=False, # Deterministic
pad_token_id=tokenizer.eos_token_id,
)
message = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(message.split("Commit message:")[-1].strip())
# Output: "Check for user existence before accessing password"
```
### CLI Tool
For easier usage, install the companion CLI tool from the [GitHub repository](https://github.com/rajtiwariee/GitCommitGenerator):
```bash
pip install -e .
commit-gen generate --commit
```
## Training Details
### Training Data
- **Dataset**: CommitPackFT (filtered subset)
- **Training samples**: 55,730
- **Validation samples**: 6,966
- **Test samples**: 6,967
### Training Procedure
- **Epochs**: 3
- **Batch Size**: 4 (effective batch size: 32 with gradient accumulation)
- **Learning Rate**: 5e-5
- **Optimizer**: AdamW
- **LoRA Config**:
- r: 16
- alpha: 32
- dropout: 0.05
- target_modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
### Hardware
- **GPU**: NVIDIA Tesla T4 (16GB)
- **Precision**: Mixed Precision (FP32 weights + FP16 compute)
- **Training Time**: ~7.5 hours
## Evaluation Results
- **BLEU Score**: 0.0244
- **ROUGE-1**: 0.1968
- **ROUGE-2**: 0.0420
- **ROUGE-L**: 0.1816
- **Exact Match Rate**: 0.00%
## Limitations
- The model is trained primarily on English commit messages
- Best suited for code changes in common programming languages
- May not handle very large diffs well (>384 tokens)
- Generated messages should be reviewed before committing
## Ethical Considerations
This model is intended to assist developers in writing commit messages, not replace human judgment. Users should:
- Review generated messages for accuracy
- Ensure messages accurately describe the changes
- Follow their team's commit message conventions
## Citation
```bibtex
@misc{git-commit-generator,
author = {Raj Tiwari},
title = {Git Commit Message Generator},
year = {2024},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/rajtiwariee/auto-commit}},
}
```
## License
MIT License
|